Skip to content

Wikidepia/indonesian-tts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 

Repository files navigation

Indonesian TTS using Coqui TTS

Models are available in Releases tab.

DO NOT USE FOR COMMERCIAL PURPOSES!

Model changelog

v1.2 (Aug 12, 2022)

Finetuned from v1.1 model on:

  • 4 hours of Audiobook dataset
  • 2000 sample of Azure TTS
  • High quality TTS data for Javanese & Sundanese

v1.1 (Aug 6, 2022)

Finetuned from LJSpeech model on:

  • 4 hours of Audiobook dataset
  • 2000 sample of Azure TTS

v1.0 (Jun 23, 2022)

Trained from scratch on:

  • 4 hours of Audiobook dataset.

Example

Ardi (Azure):

ardi-azure.mp4

Gadis (Azure):

gadis-azure.mp4

Wibowo (Audiobook):

wibowo-audiobook.mp4

How to use

You need g2p-id to convert grapheme to phoneme.

Use tts command from Coqui TTS to synthesize speech:

tts --text "saja səˈdanʔ ˈbərada di dʒaˈkarta." \
    --model_path checkpoint.pth \
    --config_path config.json \
    --speaker_idx wibowo \
    --out_path output.wav

You can get all speaker idx by using --list_speaker_idxs:

tts --model_path checkpoint.pth \
    --config_path config.json \
    --list_speaker_idxs

Data

Citations

@misc{https://doi.org/10.48550/arxiv.2106.06103,
  doi = {10.48550/ARXIV.2106.06103}, 
  url = {https://arxiv.org/abs/2106.06103},
  author = {Kim, Jaehyeon and Kong, Jungil and Son, Juhee},
  keywords = {Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering},
  title = {Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech},
  publisher = {arXiv},
  year = {2021},
  copyright = {arXiv.org perpetual, non-exclusive license}
}
@inproceedings{kjartansson-etal-tts-sltu2018,
    title = {{A Step-by-Step Process for Building TTS Voices Using Open Source Data and Framework for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese}},
    author = {Keshan Sodimana and Knot Pipatsrisawat and Linne Ha and Martin Jansche and Oddur Kjartansson and Pasindu De Silva and Supheakmungkol Sarin},
    booktitle = {Proc. The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU)},
    year  = {2018},
    address = {Gurugram, India},
    month = aug,
    pages = {66--70},
    URL   = {http://dx.doi.org/10.21437/SLTU.2018-14}
}