-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate using larger student models #174
Comments
I will try training the student Large or Base configurations from https://aclanthology.org/D19-5632.pdf. Our current configuration is Tiny. |
I looked at the recommended configurations in https://github.com/browsermt/students/tree/master/train-student/models and what HPLT folks are training here https://github.com/hplt-project/bitextor-mt-models/tree/main and it seems the recommended approach to train a larger model is to go with the Base configuration. The difference I see in code is basically these two model parameters: base dim-emb: 512
transformer-dim-ffn: 2048 tiny dim-emb: 256
transformer-dim-ffn: 1536 |
It can give ~2 BLEU points bump and is twice slower, for example here https://github.com/hplt-project/bitextor-mt-models/tree/main/swa-eng. I launched training for en-ru student in Base configuration: I also reduced early-stopping from 20 to 10. See #864 |
For clarity here, since I was confused for a bit, the |
(I updated the summary comment for some clarity on the result) |
Current models are around ~20 MB, this is very little for Desktop. We could try to go with larger models and see what kind of quality improvements we manage to get.
The text was updated successfully, but these errors were encountered: