- Attention Is All You Need - https://arxiv.org/pdf/1706.03762
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding - https://arxiv.org/pdf/1810.04805
- Layer Normalization - https://arxiv.org/pdf/1607.06450
- Finetuned Language Models Are Zero-Shot Learners - https://arxiv.org/pdf/2109.01652
- Training language models to follow instructions with human feedback - https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf
- REALM: Retrieval-Augmented Language Model Pre-Training - https://arxiv.org/abs/2002.08909
- LoRA: Low-Rank Adaptation of Large Language Models - https://arxiv.org/abs/2106.09685
- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis - https://arxiv.org/abs/2003.08934
- DQL: Playing Atari with Deep Reinforcement Learning - https://arxiv.org/pdf/1312.5602
- PPO: Proximal Policy Optimization Algorithms - https://arxiv.org/pdf/1707.06347