Hierarchical sentiment classifier, single feature classification, erroneous probabilities? #27

bwbaugh · 2013-03-25T00:12:17Z

Part of the web interface is supposed to show how each feature would be classified if it was a document of length one. Why does the hierarchical sentiment classifier only label these individual features as either neutral or positive, even when the confidence value is less than 0.5?

As an example:

<span style="color: #808080" title="neutral: 48.01%">('__start__', u'This')</span> 
<span style="color: #98c000" title="positive: 60.35%">(u'This',)</span> 
<span style="color: #808080" title="neutral: 45.32%">(u'This', u'is')</span> 
<span style="color: #b3c000" title="positive: 53.17%">(u'is',)</span> 
<span style="color: #808080" title="neutral: 38.86%">(u'is', u'only')</span> 
<span style="color: #c07e00" title="positive: 32.82%">(u'only',)</span> 
<span style="color: #808080" title="neutral: 67.93%">(u'only', u'a')</span> 
<span style="color: #9bc000" title="positive: 59.42%">(u'a',)</span> 
<span style="color: #808080" title="neutral: 51.44%">(u'a', u'test')</span> 
<span style="color: #c0a100" title="positive: 42.09%">(u'test',)</span> 
<span style="color: #808080" title="neutral: 34.62%">(u'test', '__end__')</span> <br>

Current hash: 5fd9baa

The text was updated successfully, but these errors were encountered:

For some reason, when classifying a document that consists only of one feature, the hierarchical classifier only labels the document as neutral or positive, even when the confidence values are less than 0.5. I'm still not sure why this is the case, however I am addressing the issue for the part of the web interface that shows the influence of each of the individual features that make up the query by just using the conditional probability of the feature across the labels instead of trying to classify it. In the mean time, this addresses issue gh-27.

bwbaugh · 2013-03-25T00:33:28Z

Now, using conditional probabilities only (instead of trying to classify each feature as its own document):

<span style="color: #808080" title="neutral: 51.99%">('__start__', u'This')</span> 
<span style="color: #808080" title="neutral: 52.04%">(u'This',)</span> 
<span style="color: #808080" title="neutral: 54.68%">(u'This', u'is')</span> 
<span style="color: #808080" title="neutral: 56.23%">(u'is',)</span> 
<span style="color: #808080" title="neutral: 61.14%">(u'is', u'only')</span> 
<span style="color: #808080" title="neutral: 56.40%">(u'only',)</span> 
<span style="color: #c0ad00" title="negative: 54.75%">(u'only', u'a')</span> 
<span style="color: #808080" title="neutral: 54.63%">(u'a',)</span> 
<span style="color: #c06500" title="negative: 73.62%">(u'a', u'test')</span> 
<span style="color: #808080" title="neutral: 52.74%">(u'test',)</span> 
<span style="color: #808080" title="neutral: 65.38%">(u'test', '__end__')</span> <br>

Perhaps by the prior probabilities skew the overall classification so much that just a single feature isn't capable of overcoming the priors. Now that I think about it, why are we throwing away the confidence value from the classification process, and re-calculating it from the conditionals? Which is the correct approach?

bwbaugh · 2013-03-25T00:39:36Z

When we use the original confidence value from the classification process, we get:

<span style="color: #808080" title="neutral: 50.56%">('__start__', u'This')</span> 
<span style="color: #a5c000" title="positive: 56.90%">(u'This',)</span> 
<span style="color: #808080" title="neutral: 50.56%">(u'This', u'is')</span> 
<span style="color: #a5c000" title="positive: 56.90%">(u'is',)</span> 
<span style="color: #808080" title="neutral: 50.56%">(u'is', u'only')</span> 
<span style="color: #a5c000" title="positive: 56.90%">(u'only',)</span> 
<span style="color: #808080" title="neutral: 50.56%">(u'only', u'a')</span> 
<span style="color: #a5c000" title="positive: 56.90%">(u'a',)</span> 
<span style="color: #808080" title="neutral: 50.56%">(u'a', u'test')</span> 
<span style="color: #a5c000" title="positive: 56.90%">(u'test',)</span> 
<span style="color: #808080" title="neutral: 50.56%">(u'test', '__end__')</span> <br>

Why are there only two unique confidence values across all features? Shouldn't the individual conditional probabilities cause at least some variation?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hierarchical sentiment classifier, single feature classification, erroneous probabilities? #27

Hierarchical sentiment classifier, single feature classification, erroneous probabilities? #27

bwbaugh commented Mar 25, 2013

bwbaugh commented Mar 25, 2013

bwbaugh commented Mar 25, 2013

Hierarchical sentiment classifier, single feature classification, erroneous probabilities? #27

Hierarchical sentiment classifier, single feature classification, erroneous probabilities? #27

Comments

bwbaugh commented Mar 25, 2013

bwbaugh commented Mar 25, 2013

bwbaugh commented Mar 25, 2013