Skip to content

pbylicki/media-crawl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

media-crawl

crawlers for various polish internet media

Currently used (available categories):

  • gazeta.pl (Polska, Polityka, Świat)
  • naszdziennik.pl (Polska, Świat, Ekonomia)
  • se.pl (Polska, Polityka, Świat)

To install:

  • clone project
  • cd to root directory
  • type: pip install -r requirements.txt

Requirements:

  • Python 2.7
  • Scrapy 1.0.5 (requires C++ compiler)

To run:

  • cd to root directory
  • type: scrapy crawl GazetaPl | NaszDziennik | SE -o output_filename.json

About

crawlers for various polish internet media

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages