Re-iteration of the famous learning curve experiment from Koehn and Knowles (2017)
Neural/deeplearning models are known to have poorer formance in lesser training data scenarios, which is demonstrated in Figure 1. In this work, new neural MT approaches such as Transformers are compared with the non-neural and other predecessors to see how much improvements has been made.
Train NMT models at different training corpus size, and track its peformance on a test set (BLEU). Use the same data sets and splits as Koehn and Knowles (2017), as well as compare the results with their
-
Transformer NMT requires lesser training data than RNN NMT used by Koehn and Knowles (2017). See Transformer base in the Figure 2.
-
The Transformer base is already consistently higher than prior neural model, it can be further improved by tuning a few hyperparameters such as batch size and vocabualary size (Transformer varbatch in Figure 2)