Multi-purpose dataset maker for various TTS models.
- Tortoise TTS/XTTS
- StyleTTS 2 ~ Webui
- Higgs Audio ~ Base - My fork
- VibeVoice ~ Base - My fork
- IndexTTS 2 ~ My Trainer
Tortoise, StyleTTS2, XTTS - Models like these take in a simple text file where audio:text pairs are sorted something like:
path/to/audio/file | transcription
Folder Sturcutre
Dataset_name
- train.txt
-- seg1.wav
-- seg2.wavHiggs Audio has a main metadata.json that includes all of the information and instructions for how to train on audio files, broken down by .txt files and .wav.
Folder Structure
Dataset_name
- metadata.json
- some_audio_1.txt
- some_audio_1.wav
- some_audio_2.txt
- some_audio_2.wavVibe Voice has a main .jsonl file that contains individual json entries with text and audio keys. It always prepends "Speaker 0: " before each transcription in accordance with what the trainer is expecting.
{"text": "Speaker 0: some transcription", "audio": "path/to/audio"}
Folder Structure
Dataset_name
- <project_name>_train.jsonl
- vibevoice_000000.wav
- vibevoice_000001.wav- Make sure you have astral uv installed on your PC
- Run the following:
git clone https://github.com/JarodMica/dataset-maker.git cd dataset-maker uv sync - uv should handle the installation of all packages and versioning. Once it finishes running, launch the gradio with:
uv run .\gradio_interface.py
CUDAExecution provider may not be found even when using uv. The fix for this is to remove and then add optimum[onnxruntime-gpu] in the terminal
uv run python
>>> import onnxruntime as ort
>>> print("Available providers:", ort.get_available_providers())
Available providers: ['AzureExecutionProvider', 'CPUExecutionProvider']
uv run python
>>> import onnxruntime as ort
>>> print("Available providers:", ort.get_available_providers())
Available providers: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']