Skip to content

Commit

Permalink
Merge pull request #1 from lxdatgithub/lxdatgithub-patch-1
Browse files Browse the repository at this point in the history
Update 03.md
  • Loading branch information
lxdatgithub authored Dec 16, 2019
2 parents a978609 + e9b9702 commit bf4a6d1
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions _chapters/03.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ The word embeddings can be initialized randomly or pre-trained on a large unlabe

### 2. Window Approach

The above-mentioned architecture allows for modeling of complete sentences into sentence representations. However, many NLP tasks, such as NER, POS tagging, and SRL, require word-based predictions. To adapt CNNs for such tasks, a window approach is used, which assumes that the tag of a word primarily depends on its neighboring words. For each word, thus, a fixed-size window surrounding itself is assumed and the sub-sentence ranging within the window is considered. A standalone CNN is applied to this sub-sentence as explained earlier and predictions are attributed to the word in the center of the window. Following this approach, [Poria et al. (2016)](http://ww.w.sentic.net/aspect-extraction-for-opinion-mining.pdf) employed a multi-level deep CNN to tag each word in a sentence as a possible aspect or non-aspect. Coupled with a set of linguistic patterns, their ensemble classifier managed to perform well in aspect detection.
The above-mentioned architecture allows for modeling of complete sentences into sentence representations. However, many NLP tasks, such as NER, POS tagging, and SRL, require word-based predictions. To adapt CNNs for such tasks, a window approach is used, which assumes that the tag of a word primarily depends on its neighboring words. For each word, thus, a fixed-size window surrounding itself is assumed and the sub-sentence ranging within the window is considered. A standalone CNN is applied to this sub-sentence as explained earlier and predictions are attributed to the word in the center of the window. Following this approach, [Poria et al. (2016)](http://www.sentic.net/aspect-extraction-for-opinion-mining.pdf) employed a multi-level deep CNN to tag each word in a sentence as a possible aspect or non-aspect. Coupled with a set of linguistic patterns, their ensemble classifier managed to perform well in aspect detection.

The ultimate goal of word-level classification is generally to assign a sequence of labels to the entire sentence. In such cases, structured prediction techniques such as conditional random field (CRF) are sometimes employed to better capture dependencies between adjacent class labels and finally generate cohesive label sequence giving maximum score to the whole sentence ([Kirillov et al., 2015](https://arxiv.org/pdf/1511.05067v2.pdf)).

Expand Down Expand Up @@ -81,4 +81,4 @@ CNNs inherently provide certain required features like local connectivity, weigh

Tasks like machine translation require perseverance of sequential information and long-term dependency. Thus, structurally they are not well suited for CNN networks, which lack these features. Nevertheless, [Tu et al. (2015)](https://arxiv.org/abs/1503.02357) addressed this task by considering both the semantic similarity of the translation pair and their respective contexts. Although this method did not address the sequence perseverance problem, it allowed them to get competitive results amongst other benchmarks.

Overall, CNNs are extremely effective in mining semantic clues in contextual windows. However, they are very data heavy models. They include a large number of trainable parameters which require huge training data. This poses a problem when scarcity of data arises. Another persistent issue with CNNs is their inability to model long-distance contextual information and preserving sequential order in their representations ([Kalchbrenner et al., 2014](http://www.aclweb.org/anthology/P14-1062); [Tu et al., 2015](https://arxiv.org/abs/1503.02357)). Other networks like recursive models (explained below) reveal themselves as better suited for such learning.
Overall, CNNs are extremely effective in mining semantic clues in contextual windows. However, they are very data heavy models. They include a large number of trainable parameters which require huge training data. This poses a problem when scarcity of data arises. Another persistent issue with CNNs is their inability to model long-distance contextual information and preserving sequential order in their representations ([Kalchbrenner et al., 2014](http://www.aclweb.org/anthology/P14-1062); [Tu et al., 2015](https://arxiv.org/abs/1503.02357)). Other networks like recursive models (explained below) reveal themselves as better suited for such learning.

0 comments on commit bf4a6d1

Please sign in to comment.