Skip to content

Teacher training configuration improvements #987

Open
@ZJaume

Description

I think the teacher-big task configuration used here


and here
can be optimized.

Regarding the speed of training:

  • beam-size to 4 should be enough for transformer-big models.
  • valid-mini-batch to 8 is a bit low, could be set to 32 or 64.

Regarding quality:

  • max-length is set to 100, which is pretty low on my opinion (I typically use 400). Specially if we are including sentences from certain EU corpora from OPUS that have very long lines and segments from HPLT in backtranslation (remember that HPLT does not have sentences splitted, they are in the corpus just as they appear in the original website). The valid-max-length is set to 300, which is ok, but max-length to 100 is causing all the training sentences over 100 tokens to be omitted, so the model is not learning from them (unless I'm missing a third configuration file in the pipeline).
  • I've been using swish always with no issues, but maybe there's no difference in using relu.

Metadata

Assignees

No one assigned

    Labels

    qualityImproving robustness and translation quality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions