Skip to content

kartikcode/Page-Scraper-PClub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Page-Scraper

This is a Python based web scraper designed particularly for scraping information about students
clearing Google Summer of Code in the past years. It collects the name, organisation and project
details of each successful candidate which is available on Google's official webpage: https://summerofcode.withgoogle.com
and stores it in a .csv file.
It further compares the obtained data with the student database in .json format and returns
the name of the relevant matches.

Important Stuff

  • This requires Python 3.x installed on your system along with the BeautifulSoup4 and Requests library.
  • In case you don't have the above mentioned dependencies, then follow the given installation steps:
sudo apt install python
sudo apt install pip
pip install beautifulsoup4 requests

How to Use

  • Download the repository in your local machine. Make sure you have all the dependencies installed.
  • You might want to create a different .csv file for storing the data. If this is the case, then change
    the file name in scraper.py and accordingly in check.py.
  • To provide a diiferent JSON database, copy the .json file to the same directory as that of check.py
    and accordingly change the name in check.py( scraper.py remains unchanged).
  • Finally, run the python file in your terminal using the following commands.
python scraper.py
  • Give in the URL as input. This shall store the data in the specified.csv file. (org_info.csv in this case)
  • Next up, run the check.py file which gives the common name entries as output along with some other details.

About

Page Scraper using Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages