Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Developed version 0.5.0 #149

Open
wants to merge 133 commits into
base: master
Choose a base branch
from
Open

Developed version 0.5.0 #149

wants to merge 133 commits into from

Conversation

MichaelRoeder
Copy link
Member

No description provided.

abhihc and others added 30 commits October 24, 2018 11:26
Merging Changes from dice-group/Squirrel to abhihc/Squirrel
Merging changes from Dice-group/Squirrel to abhihc/Squirrel
Getting changes from master
Adding html unit and its dependencies in pom.xml.
Adding initial version of the function to handle javascript button clicks in HtmlScrapper.java
Moving all string constants to YamlFileAttributes.
Updating the pom.xml for HtmlUnit to latest version.
Added code to stop handling javascript button click when the button is disabled.
Added a test case for govdata button click in HtmlScraperAnalyzerTest.
Added a html file for the test case (button is removed after seven clicks).
Updating the yaml file for govdata.
Updating the uri with the crawl-delay data
Setting up the time out in the htmlscraper class
Setting up the time out in the htmlscraper class
Considering Metadata description
Updating related classes
Updating related classes
Implemented a command line worker, frontier and sink to test worker flow after adding respective yaml files.
Merging changes from updated master to develop
Merging changes from develop branch to implementation test scenarios
Getting changes from develop branch into robustness branch
Merging changes from Robustness branch to Develop branch
Merging develop branch in to master branch
# Conflicts:
#	spring-config/context-sparql.xml
sritejakv and others added 20 commits June 15, 2019 15:04
Adding review comments in the test cases.
Adding java docs for the new classes.
Removing the outdated context files.
…dice-group/Squirrel into mergeDataPortal

# Conflicts:
#	pom.xml
#	squirrel.api/pom.xml
#	squirrel.api/src/main/java/org/dice_research/squirrel/queue/InMemoryQueue.java
#	squirrel.api/src/main/java/org/dice_research/squirrel/queue/IpAddressBasedQueue.java
#	squirrel.frontier/pom.xml
#	squirrel.frontier/src/main/java/org/dice_research/squirrel/data/uri/norm/NormalizerImpl.java
#	squirrel.frontier/src/main/java/org/dice_research/squirrel/frontier/impl/FrontierImpl.java
#	squirrel.web/pom.xml
#	squirrel.worker/.gitignore
#	squirrel.worker/pom.xml
#	squirrel.worker/src/main/java/org/dice_research/squirrel/analyzer/impl/html/scraper/HtmlScraper.java
#	squirrel.worker/src/main/java/org/dice_research/squirrel/fetcher/sparql/SparqlBasedFetcher.java
#	squirrel.worker/src/main/java/org/dice_research/squirrel/fetcher/sparql/SparqlDatasetFetcher.java
#	squirrel.worker/src/main/java/org/dice_research/squirrel/worker/impl/WorkerImpl.java
…led to exceptions during the startup because of missing XML schema files.
…al yaml files in tests that do not make correct use of the tag.
…. This should be reintroduced with a certain command that triggers it from the yaml file. Added a new function keyword that allows to extract a string from the HTML page and append it to the URL.
Merge data portal scraping enhancement using HTMLUnit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants