- Pretrained ASR Models
- Finetuned ASR Models
- Language Models
- Punctuation Models
- TTS Models
- Gender Classification Model
- Language Identification Models
- Interspeech 2021 ASR Models
Pretrained Model | Description | Architecture | Hours |
---|---|---|---|
Vakyansh-Conformer-SSL | This model was pre-trained using Nemo toolkit with 34,000 hours unlabeled audio in 39 Indian languages. This includes 15,000 hours of news recordings available on the internet, 10,000 hours of YouTube audios and other audio data. In addition, 9,000 hours of Indian English audio data was taken from NPTEL lectures open sourced by AI4Bharat. This model was trained in collaboration with NVIDIA (NVIDIA Graphics Pvt Ltd). We thank NVIDIA for providing the compute resources to train this model. |
Conformer-Large | 34,000 |
CLSRIL-23 | Cross Lingual Speech Representations for Indic Languages, Contains 10,000 hours of training data from 23 Indic Languages. Citation: https://arxiv.org/abs/2107.07402 |
wav2vec2-Base | 10,000 |
hindi_pretrained_4kh | Trained on 4200 hours of Hindi Data | wav2vec2-Base | 4,200 |
kannada_pretrained_1400h | Trained on 1400 hours of Kannada data | wav2vec2-XLSR | 1,400 |
Language | Pretrained Model | Finetuned Model | Finetuned Hours | Arch |
---|---|---|---|---|
Hindi | Vakyansh Conformer SSL | hindi_large_ssl_2500 | 2,500 h | Large |
Indian English | Vakyansh Conformer SSL | indian_en_large_ssl_700 | 700 h | Large |
Kannada | Vakyansh Conformer SSL | kannada_large_ssl_1000 | 1,000 h | Large |
Punjabi | Vakyansh Conformer SSL | punjabi_large_ssl_500 | 500 h | Large |
Tamil | Vakyansh Conformer SSL | tamil_large_ssl_900 | 900 h | Large |
Citation: https://arxiv.org/abs/2203.16512
Language | Pretrained Model | Finetuned Model | Dictionary | Single Model for Inference | Finetuned Hours | TS model |
---|---|---|---|---|---|---|
Hindi | CLSRIL-23 | him_4200 | dict | hindi_infer | 4200 h | hindi_ts |
Indian English | CLSRIL-23 | enm_700 | dict | english_infer | 700 h | english_ts |
Kannada | CLSRIL-23 | knm_560 | dict | kannada_infer | 560 h | kannada_ts |
Tamil | CLSRIL-23 | tam_250 | dict | tamil_infer | 250 h | tamil_ts |
Bengali | CLSRIL-23 | bnm_200 | dict | bengali_infer | 200 h | bengali_ts |
Nepali | CLSRIL-23 | nem_130 | dict | nepali_infer | 130 h | nepali_ts |
Telugu | CLSRIL-23 | tem_100 | dict | telugu_infer | 100 h | telugu_ts |
Gujarati | CLSRIL-23 | gum_100 | dict | gujarati_infer | 100 h | gujarati_ts |
Marathi | CLSRIL-23 | mrm_100 | dict | marathi_infer | 100 h | marathi_ts |
Odia | CLSRIL-23 | orm_100 | dict | odia_infer | 100 h | odia_ts |
Sanskrit | CLSRIL-23 | sam_60 | dict | sanskrit_infer | 60 h | sanskrit_ts |
Maithili | CLSRIL-23 | maim_50 | dict | maithili_infer | 50 h | maithili_ts |
Urdu | CLSRIL-23 | urm_60h | dict | urdu_infer | 60h | urdu_ts |
Punjabi | CLSRIL-23 | pam_10h | dict | punjabi_infer | 10 h | punjabi_ts |
Dogri | CLSRIL-23 | doi_55h | dict | dogri_infer | 55 h | dogri_ts |
Malayalam | CLSRIL-23 | mlm_8h | dict | malayalam_infer | 8 h | malayalam_ts |
Bhojpuri | CLSRIL-23 | bhom_60h | dict | bhojpuri_infer | 60 h | bhojpuri_ts |
Assamese | CLSRIL-23 | asm_8h | dict | assamese_infer | 8 h | assamese_ts |
Language models integrate with finetuned models.
Dataset Credits: We thanks AI4Bharat for open sourcing the Indic-Corp dataset. Link. We modified the original data by tokenizing and removing duplicates.
Language | Type | Domain | Lexicon | LM | Text Corpus |
---|---|---|---|---|---|
English | kenlm 5-gram | Biomedical | bio_lexicon | bio_lm | bio_lm_eng_text |
Language | Model | Data |
---|---|---|
Hindi | hi.zip | hindi_data |
Assamese | as.zip | assamese_data |
Bengali | bn.zip | bengali_data |
Gujarati | gu.zip | gujarati_data |
Kannada | kn.zip | kannada_data |
Malayalam | ml.zip | malayalam_data |
Marathi | mr.zip | marathi_data |
Odia | or.zip | odia_data |
Punjabi | pa.zip | punjabi_data |
Tamil | ta.zip | tamil_data |
Telugu | te.zip | telugu_data |
Dataset Credits: We thank AI4Bharat for open sourcing the Indic-Corp dataset. Link. We modified the original data by tokenizing and removing duplicates.
Below models are trained using Glow TTS and hifi GAN combination.
Language | Language Code | Gender | glow ckpt | hifi-gan ckpt |
---|---|---|---|---|
Hindi | hi | Female | hi_0_glow | hi_0_hifi |
Hindi | hi | Male | hi_1_glow | hi_1_hifi |
Kannada | kn | Female | kn_0_glow | kn_0_1_hifi |
Kannada | kn | Male | kn_1_glow | kn_0_1_hifi |
Tamil | ta | Female | ta_0_glow | ta_0_1_hifi |
Tamil | ta | Male | ta_1_glow | ta_0_1_hifi |
Telugu | te | Female | te_0_glow | te_0_1_hifi |
Telugu | te | Male | te_1_glow | te_0_1_hifi |
Odia | or | Female | or_0_glow | or_0_1_hifi |
Odia | or | Male | or_1_glow | or_0_1_hifi |
Malayalam | ml | Female | ml_0_glow | ml_0_hifi |
Malayalam | ml | Male | ml_1_glow | ml_1_hifi |
Marathi | mr | Female | mr_0_glow | mr_1_hifi |
Gujarati | gu | Male | gu_0_glow | gu_0_hifi |
Bengali | bn | Female | bn_0_glow | bn_0_1_hifi |
Bengali | bn | Male | bn_1_glow | bn_0_1_hifi |
English | en | Female | en_0_glow | en_0_hifi |
English | en | Male | en_1_glow | en_1_hifi |
Dataset Credits: We thanks IITM for open sourcing Indic-TTS dataset. Link
Type | Model Type | Model |
---|---|---|
Gender Classification | SVC | Model |
Type | Model |
---|---|
Hindi_vs_Others | Model |
Tamil_vs_Others | Model |
Language | Pretrained Model | Finetuned Model | Dictionary | Single Model for Inference |
---|---|---|---|---|
Telugu | CLSRIL-23 | te_40h_interspeech | dict | telugu_infer_interspeech |
Tamil | CLSRIL-23 | ta_40h_interspeech | dict | tamil_infer_interspeech |
Gujarati | CLSRIL-23 | gu_40h_interspeech | dict | gujarati_infer_interspeech |
Hinglish | CLSRIL-23 | hinglish_interspeech | dict | hinglish_infer_interspeech |
If you use any of our resources, please cite the following article:
@misc{chadha2022vakyansh,
title={Vakyansh: ASR Toolkit for Low Resource Indic languages},
author={Harveen Singh Chadha and Anirudh Gupta and Priyanshi Shah and Neeraj Chhimwal and Ankur Dhuriya and Rishabh Gaur and Vivek Raghavan},
year={2022},
eprint={2203.16512},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
If you use the pretrained model (CLSRIL-23) please cite the following article:
@misc{gupta2021clsril23,
title={CLSRIL-23: Cross Lingual Speech Representations for Indic Languages},
author={Anirudh Gupta and Harveen Singh Chadha and Priyanshi Shah and Neeraj Chimmwal and Ankur Dhuriya and Rishabh Gaur and Vivek Raghavan},
year={2021},
eprint={2107.07402},
archivePrefix={arXiv},
primaryClass={cs.CL}
}