DiaMoE-TTS: A Unified IPA-based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation

Overview ✨

This repository is designed to provide a comprehensive implementation for the series of research results of our unified dialect TTS. Specifically, this repository includes:

🧠 A modular multi-dialect TTS framework built on F5-TTS.
🔤 A unified IPA-based dialect frontend for consistent cross-dialect phonetic representation.
🏋️ Training & inference scripts (CLI + config examples) for end-to-end reproduction.
🤗 Hugging Face checkpoints for easy access to pre-trained models.

Short Intro on DiaMoE-TTS:

Dialect speech embodies rich cultural and linguistic diversity, yet building text-to-speech (TTS) systems for dialects remains challenging due to scarce data, inconsistent orthographies, and complex phonetic variation. To address these issues, we present DiaMoE-TTS, a unified IPA-based framework that standardizes phonetic representations and resolves grapheme-to-phoneme ambiguities. Built upon the F5-TTS architecture, the system introduces a dialect-aware Mixture-of-Experts (MoE) to model phonological differences and employs parameter-efficient adaptation with Low-Rank Adaptors (LoRA) and Conditioning Adapters for rapid transfer to new dialects. Unlike approaches dependent on large-scale or proprietary resources, DiaMoE-TTS enables scalable, open-data-driven synthesis. Experiments demonstrate natural and expressive speech generation, achieving zero-shot performance on unseen dialects and specialized domains such as Peking Opera with only a few hours of data.

The International Phonetic Alphabet (IPA) is the most widely used phonetic annotation system in the investigation and study of Chinese dialects. The vast majority of Chinese dialect corpora, including homophone tables, dictionaries and texts, utilize the IPA for phonetic transcription. The phonetic annotation system for this project is based on the IPA. It constructs a highly scalable phoneme inventory (currently containing 442 units) from a base of 100+ IPA phoneme symbols. This system is designed to support the phonetic annotation of all known Chinese dialects and is also extensible to European languages. (It currently supports 11 dialects and Mandarin; its validity has also been verified for English, French, German and the Bildts dialect of Dutch).

Regarding the construction details of the IPA dialect frontend system, please refer to:

News & Updates 🗞️

🚀[2025-09-21] Initial public release of codebase.
🔥[2025-09-25] Release checkpoints on 🤗 Hugging Face
📦[2025-09-25] Release training datasets
📄[2025-10-05] Update our paper on arXiv
🧠[2025-10-08] Release gradio app for quick start！

Installation 🛠️

# clone code
git clone https://github.com/GiantAILab/DiaMoE-TTS.git
cd DiaMoE-TTS

# conda environment
conda create -n diamoetts python=3.10
conda activate diamoetts
cd diamoe_tts
pip install -e .

Quick Start 🚀

Training

cd diamoe_tts
accelerate launch --config_file default_config.yaml \
  src/f5_tts/train/train.py \
  --config-name diamoetts.yaml

Inference

bash ./src/f5_tts/infer/batch_infer.sh

See diamoe_tts for more details.

IPA Frontend

cd dialect_frontend
bash single_frontend.sh 1-6 <dialect_name> <input_file.txt>

See ipa_frontend for more details.

Datasets 📚

We utilize the Common Voice Cantonese dataset, the Emilia Mandarin dataset and dialectal data
from the KeSpeech corpus and a open-source Sourthern Min dataset for training.
We release the frontend of the 🤗open-source dataset IPA,🔮open-source dataset IPA

Pretrained Models 🧪

Model	🤗 Hugging Face	👷 Status
🚀 MLPexpert_base_model		✅
🚀 yunbai(Peking Opera)_lora		✅
🚀 jingbai(Peking Opera)_lora		✅
🚀 nanjing_lora		✅
🛠️ our g2pw		✅

Development Roadmap & TODO 🗺️

release code for train/infer
release code for IPA frontend
release our checkpoints
release open-source training dataset IPA frontend
develop gradio app for DiaMoE-TTS

Acknowledgements 🙏

Thanks to all contributors and community members who helped improve this project.
This work builds upon F5-TTS and related research.
Our frontend uses PaddleSpeech to obtain Mandarin pinyin first.

License 📝

Our code is released under MIT License.

Star 🌟 History

📚 Citation

If you find our model helpful, please consider citing our projects 📝 and staring us ⭐️！

@article{chen2025diamoe,
  title={DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation},
  author={Chen, Ziqi and Chen, Gongyu and Wang, Yihua and Ding, Chaofan and Zhang, Wei-Qiang and others},
  journal={arXiv preprint arXiv:2509.22727},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
dialect_frontend		dialect_frontend
diamoe_tts		diamoe_tts
docs		docs
pics		pics
prompts		prompts
LICENSE		LICENSE
README.md		README.md
app_gradio.py		app_gradio.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DiaMoE-TTS: A Unified IPA-based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation

Overview ✨

News & Updates 🗞️

Installation 🛠️

Quick Start 🚀

Training

Inference

IPA Frontend

Datasets 📚

Pretrained Models 🧪

Development Roadmap & TODO 🗺️

Acknowledgements 🙏

License 📝

Star 🌟 History

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DiaMoE-TTS: A Unified IPA-based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation

Overview ✨

News & Updates 🗞️

Installation 🛠️

Quick Start 🚀

Training

Inference

IPA Frontend

Datasets 📚

Pretrained Models 🧪

Development Roadmap & TODO 🗺️

Acknowledgements 🙏

License 📝

Star 🌟 History

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages