Thai-TTS-Evaluation

This repository hosts a Thai Text-to-Speech (TTS) evaluation script, focusing on assessing speaker tone and pronunciation performance based on End-to-End Thai Text-to-Speech with Linguistic Unit.

Speaker Encoder Model

For the speaker tone objective, we utilized the Speaker Encoder Cosine Similarity (SECS) metric to assess the resemblance between the synthesized speech and the original speaker's speech. This method involves calculating the cosine similarity between the speaker embeddings derived from two speech samples, using a speaker encoder. We utilize the Coqui speaker encoder, trained on the comprehensive VoxCeleb1, VoxCeleb2, and all language CommonVoice datasets, ensuring broad generalizability in our evaluations.

Speech-to-Text Model

For pronunciation, we utilized a Thai speech-to-text model. The underlying assumption is that high-quality synthesized speech should yield similar speech-to-text results as the original speech. This method allows us to gauge the accuracy of pronunciation in the synthesized speech.

Requirements

python==3.7.13
TTS==0.8.0
webrtcvad==2.0.10
torch==1.13.0
torch-audiomentations==0.11.0
torch-complex==0.4.3
torch-pitch-shift==1.2.2
torchaudio==0.13.0
torchmetrics==0.8.0
numpy==1.21.6
huggingface-hub==0.14.1
transformers==4.25.1

Setup

Environment Setup: Ensure that you have a compatible Python environment. Using Conda or virtualenv is recommended to manage dependencies.
```
conda create --name voice_env python=3.7.13
conda activate voice_env
```
Install Dependencies:
```
pip install <requirement lists>
```
GPU Support: If you're planning to use a GPU, ensure that your PyTorch installation is compatible with your CUDA version.

Running

python evaluate_tts.py

Acknowledgements

We thank Coqui for their Speaker Embedding Model, available at: https://github.com/coqui-ai/TTS.git.
We are grateful to the Biomedical and Data Lab at Mahidol University for their contribution to the proposed Thai speech-to-text model.

Citations

@inproceedings{wisetpaitoon2024end,
  title={End-to-End Thai Text-to-Speech with Linguistic Unit},
  author={Wisetpaitoon, Kontawat and Singkul, Sattaya and Sakdejayont, Theerat and Chalothorn, Tawunrat},
  booktitle={Proceedings of the 2024 International Conference on Multimedia Retrieval},
  pages={951--959},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LICENSE-CC-BY-NC.md		LICENSE-CC-BY-NC.md
README.md		README.md
Thai_TTS.webp		Thai_TTS.webp
boundary_problem.png		boundary_problem.png
evaluate_tts.py		evaluate_tts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Thai-TTS-Evaluation

Speaker Encoder Model

Speech-to-Text Model

Requirements

Setup

Running

Acknowledgements

Citations

About

Releases

Packages

Languages

License

JoesSattes/Thai-TTS-Evaluation

Folders and files

Latest commit

History

Repository files navigation

Thai-TTS-Evaluation

Speaker Encoder Model

Speech-to-Text Model

Requirements

Setup

Running

Acknowledgements

Citations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages