This is a Python based web scraper designed particularly for scraping information about students
clearing Google Summer of Code in the past years. It collects the name, organisation and project
details of each successful candidate which is available on Google's official webpage: https://summerofcode.withgoogle.com
and stores it in a .csv
file.
It further compares the obtained data with the student database in .json
format and returns
the name of the relevant matches.
- This requires Python
3.x
installed on your system along with the BeautifulSoup4 and Requests library. - In case you don't have the above mentioned dependencies, then follow the given installation steps:
sudo apt install python
sudo apt install pip
pip install beautifulsoup4 requests
- Download the repository in your local machine. Make sure you have all the dependencies installed.
- You might want to create a different .csv file for storing the data. If this is the case, then change
the file name in scraper.py and accordingly in check.py. - To provide a diiferent JSON database, copy the .json file to the same directory as that of check.py
and accordingly change the name in check.py( scraper.py remains unchanged). - Finally, run the python file in your terminal using the following commands.
python scraper.py
- Give in the URL as input. This shall store the data in the specified
.csv
file. (org_info.csv in this case) - Next up, run the
check.py
file which gives the common name entries as output along with some other details.