Links tree

Project created for a job application, requirements are described below.

Requeriments

As a product owner I would like to have a web crawler application that takes a given site (e.g. informador.com.mx) and lists the links contained there hierarchically; links contained in a second level page should also be included. This is to be replicated ad infinitum as long as the links i within the original domain.

Acceptance criteria

List item
Links are listed in a tree structure
Webcrawler updates the page regularly while still looking for new links
Links listed should only be links that point to a different page than the one that it lives on (e.g. anchor links that point to a given section within the same page are to be excluded)
Any element that has links that point to a different page should also be included (e.g. images with links)
Only list links to external sites as a one shot, the crawler should not list secondary links within that external site (e.g. if a link from informador.com.mx points to espn.com list this link in the hierarchy, but don't show more links under espn.com )

Technical requirements

The web crawler should be constructed using node.js
The output in the webpage should be a json that nests the links showing hierarchy
The tree for the links should contain real time data

Acceptable if the page has to be refreshed when new links are found, extra points if the refresh happens without user interaction.

Notes

The server fetchs the target site every 5 minuts. You should wait until the server cache for the first time. It follow links at second level.

Setup

Clone the repository and install the dependencies.

$ git clone https://github.com/abiee/links-tree.git
$ cd links-tree
$ npm install
$ bower install
$ gulp serve

Open your browser and use the url http://localhost:9000/ to see the project working. Do not forget to install globally gulp and bower if not installed yet.

Testing

You can run server tests with test:server

$ gulp test:server

Licence

Licensed under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
app		app
server		server
test		test
.bowerrc		.bowerrc
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.jshintrc		.jshintrc
LICENSE		LICENSE
README.md		README.md
bower.json		bower.json
gulpfile.js		gulpfile.js
karma.conf.js		karma.conf.js
package.json		package.json
webpack.config.js		webpack.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Links tree

Requeriments

Acceptance criteria

Technical requirements

Notes

Setup

Testing

Licence

About

Releases

Packages

Languages

License

mexiCoders/links-tree

Folders and files

Latest commit

History

Repository files navigation

Links tree

Requeriments

Acceptance criteria

Technical requirements

Notes

Setup

Testing

Licence

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages