From b95e1fdf21f74e41ed9602dd8481256b2b60f2c6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E7=A8=8B=E6=B3=BD=E5=8D=8E?= <chengzehua@outlook.com>
Date: Sun, 11 Aug 2019 20:22:56 +0800
Subject: [PATCH] Update Sparse Attention and RNN for LM.

---
 README.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 161c966..8a5651a 100644
--- a/README.md
+++ b/README.md
@@ -17,7 +17,8 @@ After each subdomain, we proposed several ideas that may inspire your work that
 ## Natual Language Processing
 
 - [Long and Short-Term Memory](https://www.mitpressjournals.org/doi/10.1162/neco.1997.9.8.1735) : An original idea for long sentences processing, inspired by human neural information processing mechanism.
-- [GRU](https://arxiv.org/pdf/1412.3555.pdf)
+- [Recurrent neural network based language model](https://www.isca-speech.org/archive/archive_papers/interspeech_2010/i10_1045.pdf) : An original idea of introducing RNN-like structure into the language model(LM).
+- [GRU](https://arxiv.org/pdf/1412.3555.pdf) : A simple yet effective model for RNN-like structure. A large number of effective, high-precision models based on this architecture.
 - [Connectionist temporal classification](ftp://ftp.idsia.ch/pub/juergen/icml2006.pdf) : Inspired by dynamic processing and dynamic time warping(DTW) when dealing with time-warped sequences like audio data.
 - [Learning Longer Memory in RNN](https://arxiv.org/pdf/1413.7753.pdf) : Formulated Recursive Neural Network which can be applied on sequences recursively by only using a single compact model.
 - [Learning phrase representations using RNN encoder-decoder for statistical machine translation](https://arxiv.org/pdf/1406.1078.pdf) : "Cho Model" for NMT.
@@ -36,6 +37,7 @@ After each subdomain, we proposed several ideas that may inspire your work that
 - [Transformer-XL](https://arxiv.org/abs/1901.02860): Introduced relative positional encoding. State reuse resolved the problem may caused by excessive long sentence.
 - [Focused Attention Networks](https://arxiv.org/pdf/1905.11498.pdf)
 - [XLNet](https://arxiv.org/pdf/1906.08237.pdf) : Combined AR and AE models. Introduced DAG while learning AR parameters in sentence segments.
+- [Generating Long Sequences with Sparse Transformers](https://arxiv.org/pdf/1904.10509.pdf) : Simplified structure of XLNet AR part. And BERT for CV.(ADDRESS OUR #3 in [what is NEXT])
 
 So what is NEXT?