Skip to content

Latest commit

 

History

History
39 lines (25 loc) · 963 Bytes

README.md

File metadata and controls

39 lines (25 loc) · 963 Bytes

Concurrent Web Scraping with Python and Selenium

Want to learn how to build this project?

Check out the blog post.

Want to use this project?

  1. Fork/Clone

  2. Create and activate a virtual environment

  3. Install the requirements

  4. Run the scrapers:

    # sync
    (env)$ python script.py headless
    
    # parallel with multiprocessing
    (env)$ python script_parallel_1.py headless
    
    # parallel with concurrent.futures
    (env)$ python script_parallel_2.py headless
    
    # concurrent with concurrent.futures (should be the fastest!)
    (env)$ python script_concurrent.py headless
    
    # parallel with concurrent.futures and concurrent with asyncio
    (env)$ python script_asyncio.py headless
  5. Run the tests:

    (env)$ python -m pytest test/test_scraper.py
    (env)$ python -m pytest test/test_scraper_mock.py