Concurrent Web Scraping with Python and Selenium

Want to learn how to build this project?

Check out the blog post.

Want to use this project?

Fork/Clone
Create and activate a virtual environment
Install the requirements

Run the scrapers:

# sync
(env)$ python script.py headless

# parallel with multiprocessing
(env)$ python script_parallel_1.py headless

# parallel with concurrent.futures
(env)$ python script_parallel_2.py headless

# concurrent with concurrent.futures (should be the fastest!)
(env)$ python script_concurrent.py headless

# parallel with concurrent.futures and concurrent with asyncio
(env)$ python script_asyncio.py headless

Run the tests:

(env)$ python -m pytest test/test_scraper.py
(env)$ python -m pytest test/test_scraper_mock.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Concurrent Web Scraping with Python and Selenium

Want to learn how to build this project?

Want to use this project?

Files

README.md

Latest commit

History

README.md

File metadata and controls

Concurrent Web Scraping with Python and Selenium

Want to learn how to build this project?

Want to use this project?