diff --git a/_chapters/03.md b/_chapters/03.md
index 9340332..bfa6e78 100644
--- a/_chapters/03.md
+++ b/_chapters/03.md
@@ -44,7 +44,7 @@ The word embeddings can be initialized randomly or pre-trained on a large unlabe
 
 ### 2. Window Approach
 
-The above-mentioned architecture allows for modeling of complete sentences into sentence representations. However, many NLP tasks, such as NER, POS tagging, and SRL, require word-based predictions. To adapt CNNs for such tasks, a window approach is used, which assumes that the tag of a word primarily depends on its neighboring words. For each word, thus, a fixed-size window surrounding itself is assumed and the sub-sentence ranging within the window is considered. A standalone CNN is applied to this sub-sentence as explained earlier and predictions are attributed to the word in the center of the window. Following this approach, [Poria et al. (2016)](http://ww.w.sentic.net/aspect-extraction-for-opinion-mining.pdf) employed a multi-level deep CNN to tag each word in a sentence as a possible aspect or non-aspect. Coupled with a set of linguistic patterns, their ensemble classifier managed to perform well in aspect detection. 
+The above-mentioned architecture allows for modeling of complete sentences into sentence representations. However, many NLP tasks, such as NER, POS tagging, and SRL, require word-based predictions. To adapt CNNs for such tasks, a window approach is used, which assumes that the tag of a word primarily depends on its neighboring words. For each word, thus, a fixed-size window surrounding itself is assumed and the sub-sentence ranging within the window is considered. A standalone CNN is applied to this sub-sentence as explained earlier and predictions are attributed to the word in the center of the window. Following this approach, [Poria et al. (2016)](http://www.sentic.net/aspect-extraction-for-opinion-mining.pdf) employed a multi-level deep CNN to tag each word in a sentence as a possible aspect or non-aspect. Coupled with a set of linguistic patterns, their ensemble classifier managed to perform well in aspect detection. 
 
 The ultimate goal of word-level classification is generally to assign a sequence of labels to the entire sentence. In such cases, structured prediction techniques such as conditional random field (CRF) are sometimes employed to better capture dependencies between adjacent class labels and finally generate cohesive label sequence giving maximum score to the whole sentence ([Kirillov et al., 2015](https://arxiv.org/pdf/1511.05067v2.pdf)).
 
@@ -81,4 +81,4 @@ CNNs inherently provide certain required features like local connectivity, weigh
 
 Tasks like machine translation require perseverance of sequential information and long-term dependency. Thus, structurally they are not well suited for CNN networks, which lack these features. Nevertheless, [Tu et al. (2015)](https://arxiv.org/abs/1503.02357) addressed this task by considering both the semantic similarity of the translation pair and their respective contexts. Although this method did not address the sequence perseverance problem, it allowed them to get competitive results amongst other benchmarks.
 
-Overall, CNNs are extremely effective in mining semantic clues in contextual windows. However, they are very data heavy models. They include a large number of trainable parameters which require huge training data. This poses a problem when scarcity of data arises. Another persistent issue with CNNs is their inability to model long-distance contextual information and preserving sequential order in their representations ([Kalchbrenner et al., 2014](http://www.aclweb.org/anthology/P14-1062); [Tu et al., 2015](https://arxiv.org/abs/1503.02357)). Other networks like recursive models (explained below) reveal themselves as better suited for such learning. 
\ No newline at end of file
+Overall, CNNs are extremely effective in mining semantic clues in contextual windows. However, they are very data heavy models. They include a large number of trainable parameters which require huge training data. This poses a problem when scarcity of data arises. Another persistent issue with CNNs is their inability to model long-distance contextual information and preserving sequential order in their representations ([Kalchbrenner et al., 2014](http://www.aclweb.org/anthology/P14-1062); [Tu et al., 2015](https://arxiv.org/abs/1503.02357)). Other networks like recursive models (explained below) reveal themselves as better suited for such learning.