Add docs

mozilla · eu9ene · Feb 6, 2024 · Feb 5, 2024 · Feb 5, 2024 · Feb 6, 2024
commit 86934956a20797eb59abdf2c4a4c09c5cc049982
@@ -146,6 +146,8 @@ for example [`teacher.train.yml`] and in the [`train.sh`] script.
 [train.sh]: https://github.com/mozilla/firefox-translations-training/tree/main/pipeline/train/train.sh
 
 ### Model training
+
+#### Early stopping
 Early stopping can be increased to make sure that training converges.
 However, it depends on the language and might not bring much benefit but will make the training longer.
 So, you can start with `early-stopping: 20`, monitor the training and increase it if the model stops training too early.
@@ -158,9 +160,26 @@ marian-args:
     early-stopping: 20
 ```
 
-Make sure to set `optimizer-delay` so that GPU devices * optimizer-delay = 8.
+#### Optimizer delay
+Make sure to set `optimizer-delay` so that `GPU devices * optimizer-delay = 8`.
 It makes training more stable.
 
+#### Subword regularization
+```
+sentencepiece-alphas: 0.5
+```
+Sentenpiece alphas control the alpha parameter in subword sampling for the unigram model. 
+It improves robustness of the model, especially for unseen domains. 
+
+If not specified, Marian does not run Sentencepiece sampling (corresponds to `alpha=1`). 
+Lower values (`0.1`, `0.2`) increase randomization and might benefit lower resource languages with less diverse datasets. 
+However, the model might not train at all if the alpha is too low. 
+The recommended value to start with is `0.5`.
+
+More details:
+- [Sentecepiece readme](https://github.com/google/sentencepiece?tab=readme-ov-file#subword-regularization-and-bpe-dropout)
+- Paper [Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates](https://arxiv.org/pdf/1804.10959.pdf)
+
 ### Decoding (translation)
 
 `mini-batch-words` can be set depending on available GPU memory and the number of teachers.

@@ -21,4 +21,5 @@ quantize-bits: 8
 save-freq: 500
 valid-freq: 500
 valid-mini-batch: 16
+# subword regularization
 sentencepiece-alphas: 0.5
@@ -20,4 +20,5 @@ optimizer-params: [0.9, 0.98, 1e-09]
 save-freq: 5000
 valid-freq: 5000
 valid-mini-batch: 64
+# subword regularization
 sentencepiece-alphas: 0.5
@@ -8,4 +8,5 @@ valid-freq: 5000
 valid-max-length: 300
 valid-mini-batch: 8
 early-stopping: 20
+# subword regularization
 sentencepiece-alphas: 0.5