- Difficulty Controllable Question Generation for Reading Comprehension [notes] [link]
- Universal Transformers [notes] [link]
- Layer Normalization [Notes] [link]
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift [Notes] [link]
- Bilateral Multi-Perspective Matching for Natural Language Sentences [Link] [Notes]
- Deep Contextualized Word Representations [notes]
- A Model To Learn Them All [notes]
- Neural Architecture Search with Reinforcement Learning. [notes]
- Asynchronous Methods for Deep Reinforcement Learning [notes]
- Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting [notes]
- Improving Abstraction in Text Summarization [notes]
- Learning Unsupervised Learning Rules [notes]
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context [notes]
- XLNet: Generalized Autoregressive Pretraining for Language Understanding [notes]
- COMMONSENSEQA: A Question Answering Challenge Targeting Commonsense Knowledge [notes]
- AUTOSEM: Automatic Task Selection and Mixing in Multi-Task Learning [notes]
- Experience Grounds Language [link], [Notes]
- Proximal Policy Optimization Algorithms [link], [Notes]
- Fine-Tuning Language Models from Human Preferences [link], [Notes]
- Layer-wise Relevance Propagation for Neural Networks with Local Renormalization Layers [Notes]
- Distributional Reinforcement Learning for Energy-Based Sequential Models [Notes]
- Calibration Of Pretrained Transformers [link], [Notes]
- REALM: Retrieval-Augmented Language Model Pre-Training [link], [Notes]
- Byte pair encoding is suboptimal for Language Model Pretraining [link], [Notes]
- Energy-based Models for Text [link], [Notes]
- Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling [link], [Notes]
- Do You Have the Right Scissors? Tailoring Pre-trained Language Models via Monte-Carlo Methods [link], [Notes]