Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tune sentencepiece alphas #421

Merged
merged 6 commits into from
Feb 6, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add docs
  • Loading branch information
eu9ene committed Feb 5, 2024
commit 86934956a20797eb59abdf2c4a4c09c5cc049982
21 changes: 20 additions & 1 deletion docs/training-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,8 @@ for example [`teacher.train.yml`] and in the [`train.sh`] script.
[train.sh]: https://github.com/mozilla/firefox-translations-training/tree/main/pipeline/train/train.sh

### Model training

#### Early stopping
Early stopping can be increased to make sure that training converges.
However, it depends on the language and might not bring much benefit but will make the training longer.
So, you can start with `early-stopping: 20`, monitor the training and increase it if the model stops training too early.
Expand All @@ -158,9 +160,26 @@ marian-args:
early-stopping: 20
```

Make sure to set `optimizer-delay` so that GPU devices * optimizer-delay = 8.
#### Optimizer delay
Make sure to set `optimizer-delay` so that `GPU devices * optimizer-delay = 8`.
It makes training more stable.

#### Subword regularization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Praise: Thanks for the docs!

```
sentencepiece-alphas: 0.5
```
Sentenpiece alphas control the alpha parameter in subword sampling for the unigram model.
eu9ene marked this conversation as resolved.
Show resolved Hide resolved
It improves robustness of the model, especially for unseen domains.

If not specified, Marian does not run Sentencepiece sampling (corresponds to `alpha=1`).
eu9ene marked this conversation as resolved.
Show resolved Hide resolved
Lower values (`0.1`, `0.2`) increase randomization and might benefit lower resource languages with less diverse datasets.
However, the model might not train at all if the alpha is too low.
The recommended value to start with is `0.5`.

More details:
- [Sentecepiece readme](https://github.com/google/sentencepiece?tab=readme-ov-file#subword-regularization-and-bpe-dropout)
eu9ene marked this conversation as resolved.
Show resolved Hide resolved
- Paper [Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates](https://arxiv.org/pdf/1804.10959.pdf)

### Decoding (translation)

`mini-batch-words` can be set depending on available GPU memory and the number of teachers.
Expand Down
1 change: 1 addition & 0 deletions pipeline/train/configs/training/student.finetune.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,5 @@ quantize-bits: 8
save-freq: 500
valid-freq: 500
valid-mini-batch: 16
# subword regularization
sentencepiece-alphas: 0.5
1 change: 1 addition & 0 deletions pipeline/train/configs/training/student.train.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,5 @@ optimizer-params: [0.9, 0.98, 1e-09]
save-freq: 5000
valid-freq: 5000
valid-mini-batch: 64
# subword regularization
sentencepiece-alphas: 0.5
1 change: 1 addition & 0 deletions pipeline/train/configs/training/teacher.train.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ valid-freq: 5000
valid-max-length: 300
valid-mini-batch: 8
early-stopping: 20
# subword regularization
sentencepiece-alphas: 0.5