Uzbek TTS (Text-to-Speech)

A high-quality text-to-speech system for Uzbek language based on Conditional Flow Matching (CFM) architecture.

🎯 Overview

This project implements a neural text-to-speech system specifically designed for the Uzbek language. It uses a Conditional Flow Matching approach with a DiT (Diffusion Transformer) backbone to generate natural-sounding speech from Uzbek text.

✨ Features

🎵 High-quality voice synthesis for Uzbek language
🎭 Voice cloning capabilities using reference audio
⚡ Configurable speech speed and generation parameters
🚀 GPU acceleration with automatic device detection
🎧 Multiple audio formats support (WAV, OGG)
🔒 Thread-safe implementation with caching

📁 Project Structure

Uzbek_TTS/
├── ckpts/                      # Model checkpoints directory
│   └── model.safetensors       # Pre-trained model file
├── src/                        # Source code
│   ├── models/                 # Model architectures
│   ├── utils/                  # Utility functions
│   └── inference.py            # Inference pipeline
├── examples/                   # Usage examples
├── requirements.txt            # Python dependencies
├── README.md                   # This file
└── setup.py                    # Installation script

🚀 Quick Start

Prerequisites

Python 3.8 or higher
PyTorch 2.0+
CUDA-compatible GPU (recommended)

Installation

Clone the repository:

git clone https://github.com/your-username/Uzbek_TTS.git
cd Uzbek_TTS

Install dependencies:
```
pip install -r requirements.txt
```

Download the pre-trained model:

Download the model from Google Drive and place it in the ckpts/ folder:

# Create checkpoints directory
mkdir -p ckpts

# Place the downloaded model.safetensors file in ckpts/
# The file structure should be: ckpts/model.safetensors

Quick Start

Basic Usage

from omegaconf import OmegaConf
from hydra.utils import get_class
from tts import TTS

# Load configuration
model_cfg = OmegaConf.load('config/UZTTS_conf.yaml')

# Initialize TTS
tts = TTS(
    ref_audio_path="test_data/test_erkak.wav",
    ref_text="Jizzax kollejlarida infraqizil aniqlagichli turniketlar o'rnatilmoqda.",
    model_cfg=model_cfg,
    model_cls=get_class(f"uz_tts.model.{model_cfg.model.backbone}"),
    vocab='config/uz_vocab.txt',
    ckpt_path="ckpts/UZ.safetensors",
    device="auto",
    speed=1.0
)

# Generate speech
audio, sample_rate = tts.generate_speech("Assalomu alaykum! Bu Uzbek TTS tizimidir.")

# Save audio
tts.save_audio(audio, "output.wav")

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
test_data		test_data
uz_tts/model		uz_tts/model
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirement.txt		requirement.txt
tts.py		tts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uzbek TTS (Text-to-Speech)

🎯 Overview

✨ Features

📁 Project Structure

🚀 Quick Start

Prerequisites

Installation

Quick Start

Basic Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Saidakmal02/Uzbek_TTS

Folders and files

Latest commit

History

Repository files navigation

Uzbek TTS (Text-to-Speech)

🎯 Overview

✨ Features

📁 Project Structure

🚀 Quick Start

Prerequisites

Installation

Quick Start

Basic Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages