This project performs web scraping to collect league ranking data from the TeamForm website. It utilizes a headless Chrome browser to navigate through the site, collecting data for each specified quarter/week.
- Web Scraping for League Ranking Data: Collects league ranking data from the TeamForm website, providing insights into team performances and standings.
- Headless Chrome Browser Utilization: Employs a headless Chrome browser for efficient navigation and data collection from web pages.
- Configurable Data Load: The
PAGES_NUMBER
variable in_functions.py
allows control over how much data is loaded by determining the number of times the 'Load More' button is clicked, each click revealing additional rows of data. - Memory Efficiency: Designed to avoid memory issues by limiting the number of pages loaded. The script clicks the 'Load More' button a predetermined number of times (e.g., 17 times) to fetch a substantial yet manageable amount of data.
- Focused Data Retrieval: Currently, the script is specialized in retrieving league data. While it does not support 'Club' or 'National' data at the moment, its structure is conducive to future expansions in this area.
This project supports Python 3.8.
Note: This software has not been tested on earlier or later versions of Python.
- Clone the repository:
git clone https://github.com/avchauzov/teamform_web_scraping.git
- Navigate to the project directory:
cd teamform_web_scraping
- Install the required dependencies:
pip install -r requirements.txt
- Ensure the necessary dependencies, Chrome browser & Chromium are installed.
- Modify file paths & links.json file as needed.
- Run the main script to perform web scraping:
python main.py
This project is licensed under the MIT License - see the LICENSE file for details.
- Name: Andrew Chauzov
- Email: [email protected]
For more information or inquiries about the project, feel free to reach out via email.
- Selenium: A powerful tool for browser automation used in this project for efficient web scraping.