Skip to content

A simple Java web scraper that collects all links on a given website that point to the same domain

Notifications You must be signed in to change notification settings

ManuelSch/Link-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Java Link Scraper

A simple web scraper implemented in Java 8.

  1. Fetches the given entry URL
  2. Extracts all <a href=""> tags from the HTML page source that point to the same domain as the entry URL
  3. Repeats steps 1. and 2. with the newly found links until the whole website has been scraped
  4. Outputs all found URLs together with their title attributes

Uses a fixed-size Thread pool for concurrent execution of each scrape request.

About

A simple Java web scraper that collects all links on a given website that point to the same domain

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages