Today's lecture will cover simple generative models, maximum likelihood estimation from both complete and incomplete data and latent variable word alignment models.
The Expectation-Maximization algorithm is a general algorithm for estimating models when some variables are not observed. It can be seen as a form of variational inference.
- (Slides 1) Generative models, MLE and EM
- (Slides 2) Word alignment models.
- (Notes) Detailed notes
Videos:
- our lecture and seminar (in english!)
- alternative lecture on EM (outside NLP) Seminar will use this notebook.
In preparation for next week's class on Machine Translation, you should form groups of five or six students, pick one of the following questions and be prepared to give a short presentation during the lecture.
Each person should read at least one paper and your group should probably meet in advance of the class to finalize your presentation.
As well as explaining the main ideas in the papers, please also pay attention to any problems with the experimental set up in the paper and comment on whether their conclusions are well supported by their results.
- What are the main computational and statistical bottlenecks in NMT? How can we reduce them?
- Neural Machine Translation of Rare Words with Subword Units
- Simple, Fast Noise-Contrastive Estimation for Large RNN Vocabularies
- Vocabulary Manipulation for Neural Machine Translation
- Fully Character-Level Neural Machine Translation without Explicit Segmentation
- Using the Output Embedding to Improve Language Models
- What are the pros/cons of different Encoder-Decoder architectures? (RNNs, ConvS2S, Transformer, etc.)
- Google's Neural Machine Translation System
- Attention Is All You Need
- Convolutional Sequence to Sequence Learning
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
- The Importance of Being Recurrent for Modeling Hierarchical Structure
- Colorless green recurrent networks dream hierarchically
- How can monolingual data be used to improve NMT?
- On Using Monolingual Corpora in Neural Machine Translation
- Improving Neural Machine Translation Models with Monolingual Data
- Iterative Back-Translation for Neural Machine Translation
- Understanding Back-Translation at Scale
- Back-Translation Sampling by Targeting Difficult Words in Neural Machine Translation
- How can we build NMT systems for language pairs with very little parallel data?
- Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism
- Contextual Parameter Generation for Universal Neural Machine Translation
- Dual Learning for Machine Translation
- Zero-Shot Dual Machine Translation
- Phrase-Based & Neural Unsupervised Machine Translation
- Has NMT really bridged the gap between MT and human translation? What problems remain?