This repo will record papers I found interesting. I'll update sporadically.
- Low-Resource Domain Adaptation for Compositional Task-Oriented Semantic Parsing
- Low-Resource Task-Oriented Semantic Parsing via Intrinsic Modeling
- Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic Parsing
- Compositional Generalization and Natural Language Variation:Can a Semantic Parsing Approach Handle Both?
- DIVERSE BEAM SEARCH: DECODING DIVERSE SOLUTIONS FROM NEURAL SEQUENCE MODELS
- NEUROLOGIC DECODING: (Un)supervised Neural Text Generation with Predicate Logic Constraints)
- Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation
- Competition-Level Code Generation with AlphaCode
- UNITER: UNiversal Image-TExt Representation Learning
- Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
- Training Vision Transformers for Image Retrieval
- In Defense of Grid Features for Visual Question Answering
- Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling
- Learning Transferable Visual Models From Natural Language Supervision
- Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback
- Context-Aware Attention Network for Image-Text Retrieval
- Using Text to Teach Image Retrieval
- What BERT Sees: Cross-Modal Transfer for Visual Question Generation
- A Fast and Accurate One-Stage Approach to Visual Grounding
- Large-Scale Adversarial Training for Vision-and-Language Representation Learning
- e-SNLI-VE-2.0: Corrected Visual-Textual Entailment with Natural Language Explanations
- CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning Video Re-localization Use What You Have: Video Retrieval Using Representations From Collaborative Experts HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips Multi-modal Transformer for Video Retrieval
- Deep & Cross Network for Ad Click Predictions
- COLD: Towards the Next Generation of Pre-Ranking System
- Passage Re-ranking with BERT
- Document Ranking with a Pretrained Sequence-to-Sequence Model
- A User-Centered Concept Mining System for Query and Document Understanding at Tencent
- Domain-Specific Pretraining for Vertical Search Case Study on Biomedical Literature
- AliCoCo: Alibaba E-commerce Cognitive Concept Net
- Do We Need Zero Training Loss After Achieving Zero Training Error?
- Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels
- ON THE STABILITY OF FINE-TUNING BERT: MISCONCEPTIONS, EXPLANATIONS, AND STRONG BASELINES
- Subword Pooling Makes a Difference
- PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions
- To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks. ACL 2019 [pdf]
- Training Tips for the Transformer Model
- ON THE STABILITY OF FINE-TUNING BERT: MISCONCEPTIONS, EXPLANATIONS, AND STRONG BASELINES
- Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels
- X-SQL: reinforce schema representation with context
- A Comprehensive Exploration on WikiSQL with Table-Aware Word Contextualization
- Web Table Extraction, Retrieval and Augmentation: A Survey
- Query Understanding for Natural Language Enterprise Search
- Sequence-to-Nuggets: Nested Entity Mention Detection via Anchor-Region Networks
- Generalizing Natural Language Analysis through Span-relation Representations
- A Joint Named-Entity Recognizer for Heterogeneous Tag-sets Using a Tag Hierarchy
- Imposing Label-Relational Inductive Bias for Extremely Fine-Grained Entity Typing
- Knowledge-Augmented Language Model and Its Application to Unsupervised Named-Entity Recognition
- Boundary Enhanced Neural Span Classification for Nested Named Entity Recognition
- Named Entity Recognition in the Style of Object Detection
- Logical Natural Language Generation from Open-Domain Tables
- Auxiliary Tuning and its Application to Conditional Text Generation
- SenseBERT: Driving Some Sense into BERT
- PMI-MASKING: PRINCIPLED MASKING OF CORRELATED SPANS
- ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
- Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval
- Text-based Question Answering from Information Retrieval and Deep Neural Network Perspectives: A Survey
- Pre-training Tasks for Embedding-based Large-scale Retrieval
- Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering
- A Review on Deep Learning Techniques Applied to Answer Selection
- End-to-End Training of Neural Retrievers for Open-Domain Question Answering
- Leveraging passage retrieval with generative models for open domain question answering
- REALM: Retrieval-Augmented Language Model Pre-Training
- Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases
- End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems
- Machine learning for dialog state tracking: a review 2015
- The Dialog State Tracking Challenge Series: A Review 2016
- An End-to-end Approach for Handling Unknown Slot Values in Dialogue State Tracking 2018 ACL
- Alexa Conversations: An Extensible Data-driven Approach for Building Task-oriented Dialogue Systems
- Semi-supervised Sequence Learning. Andrew M. Dai, Quoc V. Le. NIPS 2015. [pdf]
- context2vec: Learning Generic Context Embedding with Bidirectional LSTM. Oren Melamud, Jacob Goldberger, Ido Dagan. CoNLL 2016. [pdf] [project] (context2vec)
- Unsupervised Pretraining for Sequence to Sequence Learning. Prajit Ramachandran, Peter J. Liu, Quoc V. Le. EMNLP 2017. [pdf] (Pre-trained seq2seq)
- Deep contextualized word representations. Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee and Luke Zettlemoyer. NAACL 2018. [pdf] [project] (ELMo)
- Universal Language Model Fine-tuning for Text Classification. Jeremy Howard and Sebastian Ruder. ACL 2018. [pdf] [project] (ULMFiT)
- Improving Language Understanding by Generative Pre-Training. Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. Preprint. [pdf] [project] (GPT)
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. NAACL 2019. [pdf] [code & model]
- Language Models are Unsupervised Multitask Learners. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever. Preprint. [pdf] [code] (GPT-2)
- ERNIE: Enhanced Language Representation with Informative Entities. Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun and Qun Liu. ACL 2019. [pdf] [code & model] (ERNIE (Tsinghua) )
- ERNIE: Enhanced Representation through Knowledge Integration. Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian and Hua Wu. Preprint. [pdf] [code] (ERNIE (Baidu) )
- Defending Against Neural Fake News. Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, Yejin Choi. NeurIPS 2019. [pdf] [project] (Grover)
- Cross-lingual Language Model Pretraining. Guillaume Lample, Alexis Conneau. NeurIPS 2019. [pdf] [code & model] (XLM)
- Multi-Task Deep Neural Networks for Natural Language Understanding. Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao. ACL 2019. [pdf] [code & model] (MT-DNN)
- MASS: Masked Sequence to Sequence Pre-training for Language Generation. Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. ICML 2019. [pdf] [code & model]
- Unified Language Model Pre-training for Natural Language Understanding and Generation. Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon. Preprint. [pdf] (UniLM)
- XLNet: Generalized Autoregressive Pretraining for Language Understanding. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. NeurIPS 2019. [pdf] [code & model]
- RoBERTa: A Robustly Optimized BERT Pretraining Approach. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. Preprint. [pdf] [code & model]
- SpanBERT: Improving Pre-training by Representing and Predicting Spans. Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, Omer Levy. Preprint. [pdf] [code & model]
- Knowledge Enhanced Contextual Word Representations. Matthew E. Peters, Mark Neumann, Robert L. Logan IV, Roy Schwartz, Vidur Joshi, Sameer Singh, Noah A. Smith. EMNLP 2019. [pdf] (KnowBert)
- VisualBERT: A Simple and Performant Baseline for Vision and Language. Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang. Preprint. [pdf] [code & model]
- ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee. NeurIPS 2019. [pdf] [code & model]
- VideoBERT: A Joint Model for Video and Language Representation Learning. Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, Cordelia Schmid. ICCV 2019. [pdf]
- LXMERT: Learning Cross-Modality Encoder Representations from Transformers. Hao Tan, Mohit Bansal. EMNLP 2019. [pdf] [code & model]
- VL-BERT: Pre-training of Generic Visual-Linguistic Representations. Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai. Preprint. [pdf]
- Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training. Gen Li, Nan Duan, Yuejian Fang, Ming Gong, Daxin Jiang, Ming Zhou. Preprint. [pdf]
- K-BERT: Enabling Language Representation with Knowledge Graph. Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng, Ping Wang. Preprint. [pdf]
- Fusion of Detected Objects in Text for Visual Question Answering. Chris Alberti, Jeffrey Ling, Michael Collins, David Reitter. EMNLP 2019. [pdf] (B2T2)
- Contrastive Bidirectional Transformer for Temporal Representation Learning. Chen Sun, Fabien Baradel, Kevin Murphy, Cordelia Schmid. Preprint. [pdf] (CBT)
- ERNIE 2.0: A Continual Pre-training Framework for Language Understanding. Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Hao Tian, Hua Wu, Haifeng Wang. Preprint. [pdf] [code]
- 75 Languages, 1 Model: Parsing Universal Dependencies Universally. Dan Kondratyuk, Milan Straka. EMNLP 2019. [pdf] [code & model] (UDify)
- Pre-Training with Whole Word Masking for Chinese BERT. Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang, Shijin Wang, Guoping Hu. Preprint. [pdf] [code & model] (Chinese-BERT-wwm)
- UNITER: Learning UNiversal Image-TExt Representations. Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu. Preprint. [pdf]
- MultiFiT: Efficient Multi-lingual Language Model Fine-tuning. Julian Eisenschlos, Sebastian Ruder, Piotr Czapla, Marcin Kardas, Sylvain Gugger, Jeremy Howard. EMNLP 2019. [pdf] [code & model]
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. Preprint. [pdf] [code & model] (T5)
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer. Preprint. [pdf]
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. ICLR 2020. [pdf]
- A Mutual Information Maximization Perspective of Language Representation Learning. Lingpeng Kong, Cyprien de Masson d'Autume, Lei Yu, Wang Ling, Zihang Dai, Dani Yogatama. ICLR 2020. [pdf]
- StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding. Wei Wang, Bin Bi, Ming Yan, Chen Wu, Jiangnan Xia, Zuyi Bao, Liwei Peng, Luo Si. ICLR 2020. [pdf]
- Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scorings. Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, Jason Weston. ICLR 2020. [pdf]
- FreeLB: Enhanced Adversarial Training for Language Understanding. Chen Zhu, Yu Cheng, Zhe Gan, Siqi Sun, Thomas Goldstein, Jingjing Liu. ICLR 2020. [pdf]
- Multilingual Alignment of Contextual Word Representations. Steven Cao, Nikita Kitaev, Dan Klein. ICLR 2020. [pdf]
- TinyBERT: Distilling BERT for Natural Language Understanding. Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu. Preprint. [pdf] [code & model]
- Distilling Task-Specific Knowledge from BERT into Simple Neural Networks. Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, Jimmy Lin. Preprint. [pdf]
- Patient Knowledge Distillation for BERT Model Compression. Siqi Sun, Yu Cheng, Zhe Gan, Jingjing Liu. EMNLP 2019. [pdf] [code]
- Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System. Ze Yang, Linjun Shou, Ming Gong, Wutao Lin, Daxin Jiang. Preprint. [pdf]
- PANLP at MEDIQA 2019: Pre-trained Language Models, Transfer Learning and Knowledge Distillation. Wei Zhu, Xiaofeng Zhou, Keqiang Wang, Xun Luo, Xiepeng Li, Yuan Ni, Guotong Xie. The 18th BioNLP workshop. [pdf]
- Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding. Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao. Preprint. [pdf] [code & model]
- Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation. Iulia Turc, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. Preprint. [pdf]
- Small and Practical BERT Models for Sequence Labeling. Henry Tsai, Jason Riesa, Melvin Johnson, Naveen Arivazhagan, Xin Li, Amelia Archer. EMNLP 2019. [pdf]
- Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT. Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer. Preprint. [pdf]
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. ICLR 2020. [pdf]
- Extreme Language Model Compression with Optimal Subwords and Shared Projections. Sanqiang Zhao, Raghav Gupta, Yang Song, Denny Zhou. Preprint. [pdf]
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf. Preprint. [pdf]
- Reducing Transformer Depth on Demand with Structured Dropout. Angela Fan, Edouard Grave, Armand Joulin. ICLR 2020. [pdf]
- Thieves on Sesame Street! Model Extraction of BERT-based APIs. Kalpesh Krishna, Gaurav Singh Tomar, Ankur P. Parikh, Nicolas Papernot, Mohit Iyyer. ICLR 2020. [pdf]
- Revealing the Dark Secrets of BERT. Olga Kovaleva, Alexey Romanov, Anna Rogers, Anna Rumshisky. EMNLP 2019. [pdf]
- How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations. Betty van Aken, Benjamin Winter, Alexander Löser, Felix A. Gers. CIKM 2019. [pdf]
- Are Sixteen Heads Really Better than One?. Paul Michel, Omer Levy, Graham Neubig. Preprint. [pdf] [code]
- Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. Di Jin, Zhijing Jin, Joey Tianyi Zhou, Peter Szolovits. Preprint. [pdf] [code]
- BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model. Alex Wang, Kyunghyun Cho. NeuralGen 2019. [pdf] [code]
- Linguistic Knowledge and Transferability of Contextual Representations. Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, Noah A. Smith. NAACL 2019. [pdf]
- What Does BERT Look At? An Analysis of BERT's Attention. Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D. Manning. BlackBoxNLP 2019. [pdf] [code]
- Open Sesame: Getting Inside BERT's Linguistic Knowledge. Yongjie Lin, Yi Chern Tan, Robert Frank. BlackBoxNLP 2019. [pdf] [code]
- Analyzing the Structure of Attention in a Transformer Language Model. Jesse Vig, Yonatan Belinkov. BlackBoxNLP 2019. [pdf]
- Blackbox meets blackbox: Representational Similarity and Stability Analysis of Neural Language Models and Brains. Samira Abnar, Lisa Beinborn, Rochelle Choenni, Willem Zuidema. BlackBoxNLP 2019. [pdf]
- BERT Rediscovers the Classical NLP Pipeline. Ian Tenney, Dipanjan Das, Ellie Pavlick. ACL 2019. [pdf]
- How multilingual is Multilingual BERT?. Telmo Pires, Eva Schlinger, Dan Garrette. ACL 2019. [pdf]
- What Does BERT Learn about the Structure of Language?. Ganesh Jawahar, Benoît Sagot, Djamé Seddah. ACL 2019. [pdf]
- Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT. Shijie Wu, Mark Dredze. EMNLP 2019. [pdf]
- How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings. Kawin Ethayarajh. EMNLP 2019. [pdf]
- Probing Neural Network Comprehension of Natural Language Arguments. Timothy Niven, Hung-Yu Kao. ACL 2019. [pdf] [code]
- Universal Adversarial Triggers for Attacking and Analyzing NLP. Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, Sameer Singh. EMNLP 2019. [pdf] [code]
- The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives. Elena Voita, Rico Sennrich, Ivan Titov. EMNLP 2019. [pdf]
- Do NLP Models Know Numbers? Probing Numeracy in Embeddings. Eric Wallace, Yizhong Wang, Sujian Li, Sameer Singh, Matt Gardner. EMNLP 2019. [pdf]
- Investigating BERT's Knowledge of Language: Five Analysis Methods with NPIs. Alex Warstadt, Yu Cao, Ioana Grosu, Wei Peng, Hagen Blix, Yining Nie, Anna Alsop, Shikha Bordia, Haokun Liu, Alicia Parrish, Sheng-Fu Wang, Jason Phang, Anhad Mohananey, Phu Mon Htut, Paloma Jeretič, Samuel R. Bowman. EMNLP 2019. [pdf] [code]
- Visualizing and Understanding the Effectiveness of BERT. Yaru Hao, Li Dong, Furu Wei, Ke Xu. EMNLP 2019. [pdf]
- Visualizing and Measuring the Geometry of BERT. Andy Coenen, Emily Reif, Ann Yuan, Been Kim, Adam Pearce, Fernanda Viégas, Martin Wattenberg. NeurIPS 2019. [pdf]
- On the Validity of Self-Attention as Explanation in Transformer Models. Gino Brunner, Yang Liu, Damián Pascual, Oliver Richter, Roger Wattenhofer. Preprint. [pdf]
- Transformer Dissection: An Unified Understanding for Transformer's Attention via the Lens of Kernel. Yao-Hung Hubert Tsai, Shaojie Bai, Makoto Yamada, Louis-Philippe Morency, Ruslan Salakhutdinov. EMNLP 2019. [pdf]
- Language Models as Knowledge Bases? Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel. EMNLP 2019, [pdf] [code]
- To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks. Matthew E. Peters, Sebastian Ruder, Noah A. Smith. RepL4NLP 2019, [pdf]
- On the Cross-lingual Transferability of Monolingual Representations. Mikel Artetxe, Sebastian Ruder, Dani Yogatama. Preprint, [pdf] [dataset]
- A Structural Probe for Finding Syntax in Word Representations. John Hewitt, Christopher D. Manning. NAACL 2019. [pdf]
- Assessing BERT’s Syntactic Abilities. Yoav Goldberg. Technical Report. [pdf]
- What do you learn from context? Probing for sentence structure in contextualized word representations. Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R. Thomas McCoy, Najoung Kim, Benjamin Van Durme, Samuel R. Bowman, Dipanjan Das, and Ellie Pavlick. ICLR 2019. [pdf]
- Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling. Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen, Benjamin Van Durme, Edouard Grave, Ellie Pavlick, Samuel R. Bowman. ACL 2019. [pdf]
- BERT is Not an Interlingua and the Bias of Tokenization. Jasdeep Singh, Bryan McCann, Richard Socher, and Caiming Xiong. DeepLo 2019. [pdf] [dataset]
- What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. Allyson Ettinger. Preprint. [pdf] [code]
- How Language-Neutral is Multilingual BERT?. Jindřich Libovický, Rudolf Rosa, and Alexander Fraser. Preprint. [pdf]
- Cross-Lingual Ability of Multilingual BERT: An Empirical Study. Karthikeyan K, Zihan Wang, Stephen Mayhew, Dan Roth. ICLR 2020. [pdf]
- Transfer Learning in Natural Language Processing. Sebastian Ruder, Matthew E. Peters, Swabha Swayamdipta, Thomas Wolf. NAACL 2019. [slides]
- Transformers: State-of-the-art Natural Language Processing. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Jamie Brew. Preprint. [pdf] [code]