conclude.tex

%!TEX root = thesis.tex


In this dissertation, we gave readers a thorough overview of neural reading comprehension: the foundations (\sys{Part I}) and the applications (\sys{Part II}), as well as how we contributed to the development of this field since it emerged in late 2015.

In Chapter~\ref{chapter:rc-overview}, we walked through the history of reading comprehension, which dates back to the 1970s. At the time, researchers already recognized its importance as a proper way of testing the language understanding abilities of computer programs. However, it was not until the 2010s that, reading comprehension started to be formulated as a supervised learning problem by collecting human-labeled training examples in the form of (passage, question, answer) triples. Since 2015, the field has been completed reshaped, by the creation of large-scale supervised datasets, and the development of neural reading comprehension models.  Although it has been only 3 years so far, the field has been moving strikingly fast. Innovations in building better datasets and more effective models have occurred alternately, and both contributed to the development of the field. We also formally defined the task of reading comprehension, and described the four most common types of problems: \ti{cloze style}, \ti{multiple choice}, \ti{span prediction} and \ti{free-form answers} and their evaluation metrics.


In Chapter~\ref{chapter:rc-models}, we covered all the elements of modern neural reading comprehension models. We introduced the \sys{Stanford Attentive Reader}, which we first proposed for the \sys{CNN/Daily Mail} cloze style task, and is one of the earliest neural reading comprehension models in this field. Our model has been studied extensively on other cloze style and multiple choice tasks. We later adapted it to the \sys{SQuAD} dataset and achieved what was then state-of-the-art performance.  Compared to conventional feature-based models, this model doesn't rely on any downstream linguistic features and all the parameters are jointly optimized together. Through empirical experiments and a careful hand-analysis, we concluded that neural models are more powerful at recognizing lexical matches and paraphrases. We also discussed recent advances in developing neural reading comprehension models, including better \ti{word representations}, \ti{attention mechanisms}, \ti{alternatives to LSTMs}, and other advances such as training objectives and data augmentation.

In Chapter~\ref{chapter:rc-future}, we discussed future work and open questions in this field. We examined error cases on \sys{SQuAD} (for both our model and the state-of-the-art model which surpasses the human performance). We concluded that these models have been doing very sophisticated matching of text but they still have difficulty understanding the inherent structure between entities and the events expressed in the text. We later discussed future work in both models and datasets. For models, we argued that besides \ti{accuracy}, there are other important aspects which have been overlooked that we will need to work on in the future, including \ti{speed and scalability}, \ti{robustness}, and \ti{interpretability}. We also believe that future models will need more structures and modules to solve more difficult reading comprehension problems. For datasets, we discussed more recent datasets developed after \sys{SQuAD} --- these datasets either require more complex reasoning across sentences or documents, or need to handle longer documents, or need to generate free-form answers instead of extracting a single span, or predict when there is no answer in the passage. Lastly, we examined several questions we think are important to the future of neural reading comprehension.

In \sys{Part II}, the key questions we wanted to answer are: Is reading comprehension only a task of measuring language understanding? If we can build high-performing reading comprehension systems which can answer comprehension questions over a short passage of text, can it enable useful applications?

In Chapter~\ref{chapter:openqa}, we showed that we can combine information retrieval techniques and neural reading comprehension
models to build an open-domain question-answering system: answering general questions over a large encyclopedia or the Web. In particular, we implemented this idea in the \sys{DrQA} project, a large-scale, factoid question answering system over English Wikipedia. We demonstrated the feasibility of doing this by evaluating the system on multiple question answering benchmarks. We also proposed a procedure to automatically create additional distantly-supervised training examples from other question answering resources and demonstrated the effectiveness of this approach. We hope that our work takes the first step in this research direction and this new paradigm of combining information retrieval and neural reading comprehension will eventually lead to a new generation of open-domain question answering systems.

In Chapter~\ref{chapter:coqa}, we addressed the conversational question answering problem, where a computer system needs to understand a text passage and answer a series of questions that appear in a conversation. To approach this, we built \sys{CoQA}: a Conversational Question Answering challenge for measuring the ability of machines to participate in a question-answering style conversation. Our dataset contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. We also built several competitive baselines for this new task, based on conversational and reading comprehension models. We believe that building such systems will play a crucial role in our future conversational AI systems.

All together, we are really excited about the progress that has been made in this field for the past 3 years and have been glad to be able to contribute to this field. At the same time, we also deeply believe there is still a long way to go towards genuine human-level reading comprehension, and we are still facing enormous challenges and a lot of open questions that we will need to address in the future. One key challenge is that we still don't have good ways to approach deeper levels of reading comprehension --- those questions which require understanding the reasoning and implications of the text. Often this occurs with \ti{how} or \ti{why} questions, such as \ti{In the story, why is Cynthia upset with her mother?}, \ti{How does John attempt to make up for his original mistake?} In the future, we will have to address the underlying science of what is being discussed, rather than just answering from text matching, to achieve this level of reading comprehension.

We also hope to encourage more researchers to work on the applications, or apply neural reading comprehension to new domains or tasks. We believe that it will lead us towards building better question answering and conversational agents and hope to see these ideas implemented and developed in industry applications.