This project will contain important translation data for Russian-Tuvan and reverse translations.
This data was collected via www.tyvan.ru platform by linguists, scientists, journalists, volunteers, etc.
The 50K file has a breakdown: training/validation/test data.
The validation and test sentences from the file are reflected at the end
The datasets with 306615 translations.
The dataset contains Tyvan-Russian paires.
Data row has the following fields:
tyv: str: text in Tuvanru: str: text in Russian (translate)
- Curated by: Ali Kuzhuget (tech and data), Ondar Choygan (data) contributors
- Language(s) (NLP): Tyvan (Tuvan), Russian
- License:: CC BY 4.0.
Below is the brief information about the languages
| Language | Language code on the website | ISO 639-3 | Glottolog |
|---|---|---|---|
| Tyvan | tyv |
tyv |
tuvi1240 |
| Russian | rus |
rus |
russ1263 |
The dataset has been downloaded from www.tyvan.ru.
The dataset is intended to help humans and machines learn the low-resourced Tyvan (Tuvan) and Russian languages.
The dataset was curates as a source of machine translation training and other NLP tools. It consists donated and professional translations from books and websites. They have been downloaded from the www.tyvan.ru website and fined by Ali Kuzhuget. No additional filtering or postprocessing has been applied.