Skip to content

Commit

Permalink
latest models added
Browse files Browse the repository at this point in the history
  • Loading branch information
jorgtied committed Mar 7, 2022
1 parent d751115 commit 74011b4
Show file tree
Hide file tree
Showing 31 changed files with 625 additions and 18 deletions.
20 changes: 20 additions & 0 deletions models/dan-ukr/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# opusTCv20210807+pbt_transformer-align_2022-03-07.zip

* dataset: opusTCv20210807+pbt
* model: transformer-align
* source language(s): dan
* target language(s): ukr
* raw source language(s): dan
* raw target language(s): ukr
* model: transformer-align
* pre-processing: normalization + SentencePiece (spm32k,spm32k)
* download: [opusTCv20210807+pbt_transformer-align_2022-03-07.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/dan-ukr/opusTCv20210807+pbt_transformer-align_2022-03-07.zip)
* test set translations: [opusTCv20210807+pbt_transformer-align_2022-03-07.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/dan-ukr/opusTCv20210807+pbt_transformer-align_2022-03-07.test.txt)
* test set scores: [opusTCv20210807+pbt_transformer-align_2022-03-07.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/dan-ukr/opusTCv20210807+pbt_transformer-align_2022-03-07.eval.txt)

## Benchmarks

| testset | BLEU | chr-F | #sent | #words | BP |
|---------|-------|-------|-------|--------|----|
| Tatoeba-test-v2021-08-07.dan-ukr | 52.6 | 0.71901 | 10 | 47 | 1.000 |

Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
release: dan-ukr/opusTCv20210807+pbt_transformer-align_2022-03-07.zip
release-date: 2022-03-07
dataset-name: opusTCv20210807+pbt
modeltype: transformer-align
vocabulary:
source: opusTCv20210807+pbt.spm32k-spm32k.vocab.yml
target: opusTCv20210807+pbt.spm32k-spm32k.vocab.yml
pre-processing: normalization + SentencePiece (spm32k,spm32k)
subwords:
source: spm32k
target: spm32k
subword-models:
source: source.spm
target: target.spm
source-languages:
- dan
target-languages:
- ukr
raw-source-languages:
- dan
raw-target-languages:
- ukr
training-data:
dan-ukr: Tatoeba-train-v2021-08-07.dan-ukr.strict (5377143) Tatoeba-train.eng-dan-ukr.aa (1000000) Tatoeba-train.eng-dan-ukr.ab (1000000) Tatoeba-train.eng-dan-ukr.ac (1000000) Tatoeba-train.eng-dan-ukr.ad (1000000) Tatoeba-train.eng-dan-ukr.ae (1000000) Tatoeba-train.eng-dan-ukr.af (1000000) Tatoeba-train.eng-dan-ukr.ag (1000000) Tatoeba-train.eng-dan-ukr.ah (1000000) Tatoeba-train.eng-dan-ukr.ai (1000000) Tatoeba-train.eng-dan-ukr.aj (1000000) Tatoeba-train.eng-dan-ukr.ak (1000000) Tatoeba-train.eng-dan-ukr.al (1000000) Tatoeba-train.eng-dan-ukr.am (1000000) Tatoeba-train.eng-dan-ukr.an (1000000) Tatoeba-train.eng-dan-ukr.ao (1000000) Tatoeba-train.eng-dan-ukr.ap (1000000) Tatoeba-train.eng-dan-ukr.aq (1000000) Tatoeba-train.eng-dan-ukr.ar (1000000) Tatoeba-train.eng-dan-ukr.as (1000000) Tatoeba-train.eng-dan-ukr.at (1000000) Tatoeba-train.eng-dan-ukr.au (1000000) Tatoeba-train.eng-dan-ukr.av (1000000) Tatoeba-train.eng-dan-ukr.aw (1000000) Tatoeba-train.eng-dan-ukr.ax (1000000) Tatoeba-train.eng-dan-ukr.ay (1000000) Tatoeba-train.eng-dan-ukr.az (1000000) Tatoeba-train.eng-dan-ukr.ba (1000000) Tatoeba-train.eng-dan-ukr.bb (1000000) Tatoeba-train.eng-dan-ukr.bc (1000000) Tatoeba-train.eng-dan-ukr.bd (1000000) Tatoeba-train.eng-dan-ukr.be (1000000) Tatoeba-train.eng-dan-ukr.bf (1000000) Tatoeba-train.eng-dan-ukr.bg (1000000) Tatoeba-train.eng-dan-ukr.bh (220604)
validation-data:
dan-ukr: Tatoeba-dev-v2021-08-07, 1001
total-size-shuffled: 1001
devset-selected: top 1001 lines of Tatoeba-dev-v2021-08-07.src.shuffled
test-data:
Tatoeba-test-v2021-08-07.dan-ukr: 10/47
BLEU-scores:
Tatoeba-test-v2021-08-07.dan-ukr: 52.6
chr-F-scores:
Tatoeba-test-v2021-08-07.dan-ukr: 0.71901
21 changes: 21 additions & 0 deletions models/deu-ukr/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,24 @@
|---------|-------|-------|-------|--------|----|
| Tatoeba-test.deu-ukr | 40.2 | 0.612 | 10000 | 54213 | 0.984 |


# opusTCv20210807+pbt_transformer-align_2022-03-07.zip

* dataset: opusTCv20210807+pbt
* model: transformer-align
* source language(s): deu
* target language(s): ukr
* raw source language(s): deu
* raw target language(s): ukr
* model: transformer-align
* pre-processing: normalization + SentencePiece (spm32k,spm32k)
* download: [opusTCv20210807+pbt_transformer-align_2022-03-07.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/deu-ukr/opusTCv20210807+pbt_transformer-align_2022-03-07.zip)
* test set translations: [opusTCv20210807+pbt_transformer-align_2022-03-07.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/deu-ukr/opusTCv20210807+pbt_transformer-align_2022-03-07.test.txt)
* test set scores: [opusTCv20210807+pbt_transformer-align_2022-03-07.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/deu-ukr/opusTCv20210807+pbt_transformer-align_2022-03-07.eval.txt)

## Benchmarks

| testset | BLEU | chr-F | #sent | #words | BP |
|---------|-------|-------|-------|--------|----|
| Tatoeba-test-v2021-08-07.deu-ukr | 39.1 | 0.61779 | 10000 | 54528 | 1.000 |

Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
release: deu-ukr/opusTCv20210807+pbt_transformer-align_2022-03-07.zip
release-date: 2022-03-07
dataset-name: opusTCv20210807+pbt
modeltype: transformer-align
vocabulary:
source: opusTCv20210807+pbt.spm32k-spm32k.vocab.yml
target: opusTCv20210807+pbt.spm32k-spm32k.vocab.yml
pre-processing: normalization + SentencePiece (spm32k,spm32k)
subwords:
source: spm32k
target: spm32k
subword-models:
source: source.spm
target: target.spm
source-languages:
- deu
target-languages:
- ukr
raw-source-languages:
- deu
raw-target-languages:
- ukr
training-data:
deu-ukr: Tatoeba-train-v2021-08-07.deu-ukr.strict (11133937) Tatoeba-train.eng-deu-ukr.aa (1000000) Tatoeba-train.eng-deu-ukr.ab (1000000) Tatoeba-train.eng-deu-ukr.ac (1000000) Tatoeba-train.eng-deu-ukr.ad (1000000) Tatoeba-train.eng-deu-ukr.ae (1000000) Tatoeba-train.eng-deu-ukr.af (1000000) Tatoeba-train.eng-deu-ukr.ag (1000000) Tatoeba-train.eng-deu-ukr.ah (1000000) Tatoeba-train.eng-deu-ukr.ai (1000000) Tatoeba-train.eng-deu-ukr.aj (1000000) Tatoeba-train.eng-deu-ukr.ak (1000000) Tatoeba-train.eng-deu-ukr.al (1000000) Tatoeba-train.eng-deu-ukr.am (1000000) Tatoeba-train.eng-deu-ukr.an (1000000) Tatoeba-train.eng-deu-ukr.ao (1000000) Tatoeba-train.eng-deu-ukr.ap (1000000) Tatoeba-train.eng-deu-ukr.aq (1000000) Tatoeba-train.eng-deu-ukr.ar (1000000) Tatoeba-train.eng-deu-ukr.as (1000000) Tatoeba-train.eng-deu-ukr.at (1000000) Tatoeba-train.eng-deu-ukr.au (1000000) Tatoeba-train.eng-deu-ukr.av (1000000) Tatoeba-train.eng-deu-ukr.aw (1000000) Tatoeba-train.eng-deu-ukr.ax (1000000) Tatoeba-train.eng-deu-ukr.ay (1000000) Tatoeba-train.eng-deu-ukr.az (1000000) Tatoeba-train.eng-deu-ukr.ba (1000000) Tatoeba-train.eng-deu-ukr.bb (1000000) Tatoeba-train.eng-deu-ukr.bc (1000000) Tatoeba-train.eng-deu-ukr.bd (1000000) Tatoeba-train.eng-deu-ukr.be (1000000) Tatoeba-train.eng-deu-ukr.bf (1000000) Tatoeba-train.eng-deu-ukr.bg (1000000) Tatoeba-train.eng-deu-ukr.bh (220604)
validation-data:
deu-ukr: Tatoeba-dev-v2021-08-07, 11580
total-size-shuffled: 11580
devset-selected: top 5000 lines of Tatoeba-dev-v2021-08-07.src.shuffled
test-data:
Tatoeba-test-v2021-08-07.deu-ukr: 10000/54528
BLEU-scores:
Tatoeba-test-v2021-08-07.deu-ukr: 39.1
chr-F-scores:
Tatoeba-test-v2021-08-07.deu-ukr: 0.61779
20 changes: 20 additions & 0 deletions models/fin-ukr/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# opusTCv20210807+pbt_transformer-align_2022-03-07.zip

* dataset: opusTCv20210807+pbt
* model: transformer-align
* source language(s): fin
* target language(s): ukr
* raw source language(s): fin
* raw target language(s): ukr
* model: transformer-align
* pre-processing: normalization + SentencePiece (spm32k,spm32k)
* download: [opusTCv20210807+pbt_transformer-align_2022-03-07.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/fin-ukr/opusTCv20210807+pbt_transformer-align_2022-03-07.zip)
* test set translations: [opusTCv20210807+pbt_transformer-align_2022-03-07.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/fin-ukr/opusTCv20210807+pbt_transformer-align_2022-03-07.test.txt)
* test set scores: [opusTCv20210807+pbt_transformer-align_2022-03-07.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/fin-ukr/opusTCv20210807+pbt_transformer-align_2022-03-07.eval.txt)

## Benchmarks

| testset | BLEU | chr-F | #sent | #words | BP |
|---------|-------|-------|-------|--------|----|
| Tatoeba-test-v2021-08-07.fin-ukr | 40.1 | 0.58239 | 33 | 218 | 0.953 |

Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
release: fin-ukr/opusTCv20210807+pbt_transformer-align_2022-03-07.zip
release-date: 2022-03-07
dataset-name: opusTCv20210807+pbt
modeltype: transformer-align
vocabulary:
source: opusTCv20210807+pbt.spm32k-spm32k.vocab.yml
target: opusTCv20210807+pbt.spm32k-spm32k.vocab.yml
pre-processing: normalization + SentencePiece (spm32k,spm32k)
subwords:
source: spm32k
target: spm32k
subword-models:
source: source.spm
target: target.spm
source-languages:
- fin
target-languages:
- ukr
raw-source-languages:
- fin
raw-target-languages:
- ukr
training-data:
fin-ukr: Tatoeba-train-v2021-08-07.fin-ukr.strict (5400101) Tatoeba-train.eng-fin-ukr.aa (1000000) Tatoeba-train.eng-fin-ukr.ab (1000000) Tatoeba-train.eng-fin-ukr.ac (1000000) Tatoeba-train.eng-fin-ukr.ad (1000000) Tatoeba-train.eng-fin-ukr.ae (1000000) Tatoeba-train.eng-fin-ukr.af (1000000) Tatoeba-train.eng-fin-ukr.ag (1000000) Tatoeba-train.eng-fin-ukr.ah (1000000) Tatoeba-train.eng-fin-ukr.ai (1000000) Tatoeba-train.eng-fin-ukr.aj (1000000) Tatoeba-train.eng-fin-ukr.ak (1000000) Tatoeba-train.eng-fin-ukr.al (1000000) Tatoeba-train.eng-fin-ukr.am (1000000) Tatoeba-train.eng-fin-ukr.an (1000000) Tatoeba-train.eng-fin-ukr.ao (1000000) Tatoeba-train.eng-fin-ukr.ap (1000000) Tatoeba-train.eng-fin-ukr.aq (1000000) Tatoeba-train.eng-fin-ukr.ar (1000000) Tatoeba-train.eng-fin-ukr.as (1000000) Tatoeba-train.eng-fin-ukr.at (1000000) Tatoeba-train.eng-fin-ukr.au (1000000) Tatoeba-train.eng-fin-ukr.av (1000000) Tatoeba-train.eng-fin-ukr.aw (1000000) Tatoeba-train.eng-fin-ukr.ax (1000000) Tatoeba-train.eng-fin-ukr.ay (1000000) Tatoeba-train.eng-fin-ukr.az (1000000) Tatoeba-train.eng-fin-ukr.ba (1000000) Tatoeba-train.eng-fin-ukr.bb (1000000) Tatoeba-train.eng-fin-ukr.bc (1000000) Tatoeba-train.eng-fin-ukr.bd (1000000) Tatoeba-train.eng-fin-ukr.be (1000000) Tatoeba-train.eng-fin-ukr.bf (1000000) Tatoeba-train.eng-fin-ukr.bg (1000000) Tatoeba-train.eng-fin-ukr.bh (220604)
validation-data:
fin-ukr: Tatoeba-dev-v2021-08-07, 1000
total-size-shuffled: 1000
devset-selected: top 1000 lines of Tatoeba-dev-v2021-08-07.src.shuffled
test-data:
Tatoeba-test-v2021-08-07.fin-ukr: 33/218
BLEU-scores:
Tatoeba-test-v2021-08-07.fin-ukr: 40.1
chr-F-scores:
Tatoeba-test-v2021-08-07.fin-ukr: 0.58239
2 changes: 1 addition & 1 deletion models/released-model-results-all.json

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions models/released-model-results-all.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2139,6 +2139,7 @@ dan-swg 0.003 1.5 https://object.pouta.csc.fi/Tatoeba-MT-models/gem-gem/opus-202
dan-swg 0.003 1.5 https://object.pouta.csc.fi/Tatoeba-MT-models/gem-gem/opus-2020-10-04.zip 1 13
dan-swg 0.003 1.5 https://object.pouta.csc.fi/Tatoeba-MT-models/gem-gem/opus-2021-02-24.zip 1 15
dan-tur 0.662 40.5 https://object.pouta.csc.fi/Tatoeba-MT-models/dan-tur/opus-2021-02-18.zip 757 3429
dan-ukr 0.71901 52.6 https://object.pouta.csc.fi/Tatoeba-MT-models/dan-ukr/opusTCv20210807+pbt_transformer-align_2022-03-07.zip 10 47
dan-ukr 0.489 15.9 https://object.pouta.csc.fi/Tatoeba-MT-models/ine-ine/opus-2020-07-27.zip 4 18
dan-ukr 0.438 15.6 https://object.pouta.csc.fi/Tatoeba-MT-models/ine-ine/opus-2020-07-21.zip 4 18
dan-urd 0.544 21.7 https://object.pouta.csc.fi/Tatoeba-MT-models/ine-ine/opus-2020-07-21.zip 3 12
Expand Down Expand Up @@ -2699,6 +2700,7 @@ deu-tur 0.448 18.5 https://object.pouta.csc.fi/Tatoeba-MT-models/tatoeba-lowest/
deu-uig 0.164 0.3 https://object.pouta.csc.fi/Tatoeba-MT-models/tatoeba-lower/opus-2020-06-19.zip 153 574
deu-ukr 0.612 40.2 https://object.pouta.csc.fi/Tatoeba-MT-models/deu-ukr/opus-2021-02-18.zip 10000 54213
deu-ukr 0.606 39.5 https://object.pouta.csc.fi/Tatoeba-MT-models/gmw-zle/opus1m-2021-02-19.zip 10000 54052
deu-ukr 0.61779 39.1 https://object.pouta.csc.fi/Tatoeba-MT-models/deu-ukr/opusTCv20210807+pbt_transformer-align_2022-03-07.zip 10000 54528
deu-ukr 0.489 27.2 https://object.pouta.csc.fi/Tatoeba-MT-models/ine-ine/opus-2020-07-27.zip 10000 42440
deu-ukr 0.472 25.9 https://object.pouta.csc.fi/Tatoeba-MT-models/ine-ine/opus-2020-07-21.zip 10000 42440
deu-ukr 0.445 23.8 https://object.pouta.csc.fi/Tatoeba-MT-models/tatoeba-lowest/opus-2020-06-15.zip 10000 42440
Expand Down Expand Up @@ -8345,6 +8347,7 @@ fin-tur 0.395 13.8 https://object.pouta.csc.fi/Tatoeba-MT-models/tatoeba-zero/op
fin-tur 0.382 13.8 https://object.pouta.csc.fi/Tatoeba-MT-models/tatoeba-lower/opus-2020-06-19.zip 1796 8001
fin-ukr 0.617 46.8 https://object.pouta.csc.fi/Tatoeba-MT-models/fin-zle/opus4m+btTCv20210807-2022-01-19.zip 33 215
fin-ukr 0.616 41.0 https://object.pouta.csc.fi/Tatoeba-MT-models/fiu-sla/opus-2021-02-19.zip 32 212
fin-ukr 0.58239 40.1 https://object.pouta.csc.fi/Tatoeba-MT-models/fin-ukr/opusTCv20210807+pbt_transformer-align_2022-03-07.zip 33 218
fin-ukr 0.578 38.1 https://object.pouta.csc.fi/Tatoeba-MT-models/fiu-zle/opus-2021-02-18.zip 32 209
fin-ukr 0.455 32.2 https://object.pouta.csc.fi/Tatoeba-MT-models/tatoeba-lower/opus-2020-06-19.zip 32 160
fin-ukr 0.496 31.3 https://object.pouta.csc.fi/Tatoeba-MT-models/tatoeba-zero/opus-2020-06-21.zip 32 160
Expand Down Expand Up @@ -16905,6 +16908,7 @@ swe-slv 0.116 19.0 https://object.pouta.csc.fi/Tatoeba-MT-models/ine-ine/opus-20
swe-sme 0.168 3.5 https://object.pouta.csc.fi/Tatoeba-MT-models/gmq-fiu/opus-2021-02-18.zip 4 22
swe-spa 0.569 37.1 https://object.pouta.csc.fi/Tatoeba-MT-models/ine-ine/opus-2020-07-27.zip 1351 6783
swe-spa 0.556 35.9 https://object.pouta.csc.fi/Tatoeba-MT-models/ine-ine/opus-2020-07-21.zip 1351 6783
swe-ukr 0.70141 39.6 https://object.pouta.csc.fi/Tatoeba-MT-models/swe-ukr/opusTCv20210807+pbt_transformer-align_2022-03-07.zip 4 19
swe-yid 0.156 14.7 https://object.pouta.csc.fi/Tatoeba-MT-models/gem-gem/opus-2020-07-27.zip 11 50
swe-yid 0.165 14.5 https://object.pouta.csc.fi/Tatoeba-MT-models/gem-gem/opus-2020-07-06.zip 11 50
swe-yid 0.157 14.5 https://object.pouta.csc.fi/Tatoeba-MT-models/gem-gem/opus-2020-10-04.zip 11 50
Expand Down Expand Up @@ -17746,6 +17750,7 @@ ukr-ces 0.491 28.6 https://object.pouta.csc.fi/Tatoeba-MT-models/tatoeba-lower/o
ukr-cmn_Hans 0.205 23.8 https://object.pouta.csc.fi/Tatoeba-MT-models/zle-zho/opus1m-2021-05-16.zip 853 7925
ukr-cmn_Hant 0.214 25.2 https://object.pouta.csc.fi/Tatoeba-MT-models/zle-zho/opus1m-2021-05-16.zip 530 4119
ukr-crh 0.220 6.2 https://object.pouta.csc.fi/Tatoeba-MT-models/tatoeba-lower/opus-2020-06-19.zip 9 39
ukr-dan 0.92589 86.7 https://object.pouta.csc.fi/Tatoeba-MT-models/ukr-dan/opusTCv20210807+pft_transformer-align_2022-03-07.zip 10 54
ukr-dan 0.469 25.8 https://object.pouta.csc.fi/Tatoeba-MT-models/ine-ine/opus-2020-07-27.zip 4 23
ukr-dan 0.411 18.7 https://object.pouta.csc.fi/Tatoeba-MT-models/ine-ine/opus-2020-07-21.zip 4 23
ukr-deu 0.69211 52.2 https://object.pouta.csc.fi/Tatoeba-MT-models/ukr-deu/opusTCv20210807+pft_transformer-align_2022-03-06.zip 10000 62652
Expand Down Expand Up @@ -17970,6 +17975,7 @@ ukr-srp_Cyrl 0.683 51.4 https://object.pouta.csc.fi/Tatoeba-MT-models/sla-sla/op
ukr-srp_Cyrl 0.641 47.3 https://object.pouta.csc.fi/Tatoeba-MT-models/ukr-hbs/opus-2021-02-23.zip 204 1110
ukr-srp_Latn 0.671 43.5 https://object.pouta.csc.fi/Tatoeba-MT-models/sla-sla/opus-2021-02-18.zip 348 1716
ukr-srp_Latn 0.651 42.4 https://object.pouta.csc.fi/Tatoeba-MT-models/ukr-hbs/opus-2021-02-23.zip 348 1716
ukr-swe 0.65967 38.9 https://object.pouta.csc.fi/Tatoeba-MT-models/ukr-swe/opusTCv20210807+pft_transformer-align_2022-03-07.zip 4 20
ukr-toki 0.057 0.8 https://object.pouta.csc.fi/Tatoeba-MT-models/tatoeba-zero/opus-2020-06-21.zip 5 65
ukr-tur 0.655 39.3 https://object.pouta.csc.fi/Tatoeba-MT-models/ukr-tur/opus-2021-02-19.zip 2500 11844
ukr-tur 0.516 25.8 https://object.pouta.csc.fi/Tatoeba-MT-models/tatoeba-lower/opus-2020-06-19.zip 2500 9034
Expand Down
Loading

0 comments on commit 74011b4

Please sign in to comment.