Teacher training configuration improvements

I think the [`teacher-big` task](https://github.com/marian-nmt/marian-dev/blob/6f6d484665dfb35fe1d85b49b6b9d29c2661fd15/src/common/aliases.cpp#L112) configuration used here https://github.com/mozilla/translations/blob/e714bb77ece276dbedea88c34f7b9dbe926fe174/pipeline/train/configs/model/teacher.yml#L12
and [here](https://github.com/mozilla/translations/blob/main/pipeline/train/configs/training/teacher.train.yml)
can be optimized.

Regarding the speed of training:
 - `beam-size` to 4 should be enough for transformer-big models.
 - `valid-mini-batch` to 8 is a bit low, could be set to 32 or 64.

Regarding quality:
 - `max-length` is set to 100, which is pretty low on my opinion (I typically use 400). Specially if we are including sentences from certain EU corpora from OPUS that have very long lines and segments from HPLT in backtranslation (remember that HPLT does not have sentences splitted, they are in the corpus just as they appear in the original website). The `valid-max-length` is set to 300, which is ok, but `max-length` to 100 is causing all the training sentences over 100 tokens to be omitted, so the model is not learning from them (unless I'm missing a third configuration file in the pipeline).
- I've been using `swish` always with no issues, but maybe there's no difference in using `relu`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Teacher training configuration improvements #987

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development