From b95e1fdf21f74e41ed9602dd8481256b2b60f2c6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E7=A8=8B=E6=B3=BD=E5=8D=8E?= Date: Sun, 11 Aug 2019 20:22:56 +0800 Subject: [PATCH] Update Sparse Attention and RNN for LM. --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 161c966..8a5651a 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,8 @@ After each subdomain, we proposed several ideas that may inspire your work that ## Natual Language Processing - [Long and Short-Term Memory](https://www.mitpressjournals.org/doi/10.1162/neco.1997.9.8.1735) : An original idea for long sentences processing, inspired by human neural information processing mechanism. -- [GRU](https://arxiv.org/pdf/1412.3555.pdf) +- [Recurrent neural network based language model](https://www.isca-speech.org/archive/archive_papers/interspeech_2010/i10_1045.pdf) : An original idea of introducing RNN-like structure into the language model(LM). +- [GRU](https://arxiv.org/pdf/1412.3555.pdf) : A simple yet effective model for RNN-like structure. A large number of effective, high-precision models based on this architecture. - [Connectionist temporal classification](ftp://ftp.idsia.ch/pub/juergen/icml2006.pdf) : Inspired by dynamic processing and dynamic time warping(DTW) when dealing with time-warped sequences like audio data. - [Learning Longer Memory in RNN](https://arxiv.org/pdf/1413.7753.pdf) : Formulated Recursive Neural Network which can be applied on sequences recursively by only using a single compact model. - [Learning phrase representations using RNN encoder-decoder for statistical machine translation](https://arxiv.org/pdf/1406.1078.pdf) : "Cho Model" for NMT. @@ -36,6 +37,7 @@ After each subdomain, we proposed several ideas that may inspire your work that - [Transformer-XL](https://arxiv.org/abs/1901.02860): Introduced relative positional encoding. State reuse resolved the problem may caused by excessive long sentence. - [Focused Attention Networks](https://arxiv.org/pdf/1905.11498.pdf) - [XLNet](https://arxiv.org/pdf/1906.08237.pdf) : Combined AR and AE models. Introduced DAG while learning AR parameters in sentence segments. +- [Generating Long Sequences with Sparse Transformers](https://arxiv.org/pdf/1904.10509.pdf) : Simplified structure of XLNet AR part. And BERT for CV.(ADDRESS OUR #3 in [what is NEXT]) So what is NEXT?