-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnotes.txt
109 lines (101 loc) · 4.05 KB
/
notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
######################################################################################
General notes
-> AlchemyLanguage:
- Concept tagging: can identify non explicitly mentioned concepts from input, higher level than keyword extraction
and would allow us to find more general themes across many articles
- Emotion Analysis: If we can target phrases, entities, or keywords from a sentence, can tag them with emotions like
anger, disgust, fear, joy, and sadness. In order for this to be useful, each emotion would need known opposites
- (Typed?) Relations: extracting relations would give us the ability to identify more complex themes such as a bill
trying to be passed through Congress, and perform our sentiment analysis comparison on that as well. Look into this.
- Sentiment Analysis: 5 levels of targetted sentiment analysis is possible:
* Document level: (should not use, since we are doing analysis on the sentence level)
* entity level: Has promise, may be what we are looking for
* quotation level: Probably literal quotations, possibly useful if one quote comes up in many articles
* directional level: Not sure what exactly this is, or its usefulness
* keyword level: Maybe too low level? Worst case senario, we use this level.
- Combined calls: can probably use this to minimize the number of API calls we make, keeping it at least to the document level
* UPDATE: Would not save us API calls, would just potentially save us time and make data easier to organize
We have two options (as I see it). Base senteiment analysis on entities or keywords (or both)
Entities are more sparse than keywords, but keywords have the danger of being irrelivant
Also, could use senteiment or emotion (more meterics) as basis for how we compare themes within and between articles
-> For this, further discussion with the squirrel is necessary
Dependency Notes:
-> need watson developer cloud: run command '$ sudo pip install --upgrade watson-developer-cloud'
-> need networkx: run command '$ sudo pip install networkx'
-> need neo4j (for graph vizualization):
-> for ubuntu, run following commands:
'$ wget -O - https://debian.neo4j.org/neotechnology.gpg.key | sudo apt-key add -'
'$ echo 'deb http://debian.neo4j.org/repo stable/' | sudo tee /etc/apt/sources.list.d/neo4j.list'
'$ sudo apt-get update'
'$ sudo apt-get install neo4j'
'$ sudo pip install py2neo'
API Key Info:
url: "https://gateway-a.watsonplatform.net/calls"
anders key: "d0d23b0df6ff757a6546557f8a7076f0c83f3b7d"
courtney key: "47c833e639827f3db27ebb8ff3a2bef90e18202e"
eryka key: "1496111dea3dd74c316ab932a75052fdc1e309f4"
######################################################################################
main.py Output
json file (AlchemyData_<ArticleName>.json) in current directory with structure
[
# list over entire article
{
# data for each sentence
"docSentiment": { "type": "neutral" },
"sentence": ["A", "new", "Senate", "proposal", "to", ... , "."],
"taxonomy": [
# list over all taxonomies
{
# data for each taxonomy
"confident": "no",
"score": "0.214395",
"label": "/law, govt and politics/government"
}, ...
],
"number": 0,
"entities": [
# list over all entities
{
# data for each entity
"count": "1",
"sentiment":
{
"score": "-0.331475",
"type": "negative"
},
"disambiguated": { # not used },
"emotions":
{
# these 5 emotions always used
"anger": "0.345283",
"joy": "0.02922",
"fear": "0.2754",
"sadness": "0.149309",
"disgust": "0.328006"
},
"text": "United States",
"relevance": "0.33",
"type": "Country"
}, ...
],
"docEmotions": { "anger": "0.345283", ... },
"keywords": [
# list over all keywords
{
# data for each keyword
"relevance": "0.966307",
"text": "new Senate proposal",
"emotions": { "anger": "0.345283", ... },
"sentiment":
{
# if sentiment is not neutral, it has a score
"score": "-0.331475",
"type": "negative"
}
}, ...
]
}, ...
]
to do:
add directory structure
run for all files