The project has following components that are loosely coupled:
- Html/JavaScript based front-end making ajax calls to restful APIs
- Python based scrapper service consuming a message queue
- Spring Boot based APIs that handle CRUD with Database
- PostgreSQL and RabbitMQ services encapsulated in docker images
user@ubuntu:~/my-scraper$ cd web-dev/
user@ubuntu:~/my-scraper/web-dev$ chmod +x runserver.sh
user@ubuntu:~/my-scraper/web-dev$ ./runserver.sh
Serving HTTP on 0.0.0.0 port 8088 ...
user@ubuntu:~/my-scraper$ cd RabbitMQ/
user@ubuntu:~/my-scraper/RabbitMQ$ docker-compose up -d
user@ubuntu:~/my-scraper$ cd PostgreSQL/
user@ubuntu:~/my-scraper/PostgreSQL$ docker build -t "my-postgres" .
user@ubuntu:~/my-scraper/PostgreSQL$ docker run --name postgres1 -p 5432:5432 -d my-postgres
user@ubuntu:~/my-scraper$ gradle run
user@ubuntu:~/my-scraper$ cd python2-scrapper/src
user@ubuntu:~/python2-scrapper/src$ python2 worker.py
METHODs | URL | Payload | description |
---|---|---|---|
GET | http://localhost:8080/scrapes | N/A | Load an array of scrapes |
DELETE | http://localhost:8080/scrapes/id | N/A | Delete a scrape |
POST | http://localhost:8080/scrapes | data: {url: newUrl } | Add a scrape |
PATCH | http://localhost:8080/scrapes/id | data: {url: newUrl } | Update a scrape |
Using the URL:
GET http://localhost:8080/scrapes/
Will return the following result:
{
"_embedded": {
"scrapes": [
{
"url": "https://www.ebay.com",
"summary": "Largest Image File: https://i.ebayimg.com/images/g/XbUAAOSwIipdCmHV/s-l1600.jpg Sized: 69027",
"created": "2019-07-01T07:53:40.887+0000",
"status": "Success",
"_links": {
"self": {
"href": "http://localhost:8080/scrapes/1"
},
"scrape": {
"href": "http://localhost:8080/scrapes/1"
}
}
},
{
"url": "https://jd.com",
"summary": "Scrape Failed. Reason: max() arg is an empty sequence",
"created": "2019-07-01T08:28:03.774+0000",
"status": "Failure",
"_links": {
"self": {
"href": "http://localhost:8080/scrapes/4"
},
"scrape": {
"href": "http://localhost:8080/scrapes/4"
}
}
},
{
"url": "https://www.nytimes.com",
"summary": "Largest Image File: https://static01.nyt.com/images/2019/07/01/world/01hk-briefing-top/01hk-briefing-top-videoSixteenByNine3000.jpg Sized: 1274759",
"created": "2019-07-01T08:28:58.485+0000",
"status": "Success",
"_links": {
"self": {
"href": "http://localhost:8080/scrapes/5"
},
"scrape": {
"href": "http://localhost:8080/scrapes/5"
}
}
}
]
},
"_links": {
"self": {
"href": "http://localhost:8080/scrapes"
},
"profile": {
"href": "http://localhost:8080/profile/scrapes"
},
"search": {
"href": "http://localhost:8080/scrapes/search"
}
}
}