-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME
42 lines (26 loc) · 1.26 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
DESCRIPTION
Storylinenews is a web mining utility that scrapes major newsfeeds to discover
the day's most important topics and determine the overall sentiment of
coverage on those topics. It includes a log converter compatible with the
excellent 'storylines' visualization technique.
DEPENDENCIES
The project requires version 1.5 of Pattern:
http://www.clips.ua.ac.be/pages/pattern
and the SentiWordNet polarity lexicon (sign up for a research license):
http://sentiwordnet.isti.cnr.it/
INSTRUCTIONS
First install SentiWordNet. It goes in $(PATTERNDIR)/en/wordnet.
Run scraper.py with the location of a feed list as the first argument.
> python scraper.py feeds.json
Let it run for as long as you'd like or indefinitely. At any time you
can run converter.py to generate an XML file compatible
with Michael Ogawa's Processing implementation of the 'storyline'
cluster graphing algorithm. The 'news.config' file provided in the
data directory will allow the generation of sentiment-displaying
topical storylines.
For more information and the download, visit the storylines website:
http://www.michaelogawa.com/research/storylines/
FEED LIST FORMAT
The feed list is a JSON string dump of a dict of the form
{'FEEDNAME':'feedurl'}
where feedurl points to an RSS or Atom feed.