A Text-to-Speech (TTS) application using Rust and WASM with Parler-TTS model integration.
If you are looking for a pure rust pure rust TTS system that is production ready, you are in the wrong place. 😉
This project consists of:
- Backend: Axum-based Rust server that uses Parler-TTS model for speech generation
- Frontend: WASM-compiled Rust library for browser audio functionality
- Static Files: HTML frontend served from the backend's public directory
├── backend/ # Axum HTTP server with TTS API endpoints
├── frontend/ # WASM module compiled from Rust for browser audio functionality
├── scripts/ # Build and development scripts
└── public/ # Static frontend files served by the backend
- Text-to-speech generation using Parler-TTS large model
- Customizable voice descriptions
- Adjustable generation parameters:
- Temperature: Controls randomness (0.0-2.0)
- Seed: Random seed for reproducible generation
- Top P: Nucleus sampling parameter (0.0-1.0)
- Hardware acceleration support (CUDA, Metal, MKL, Accelerate)
- Web-based interface for easy interaction
- Rust (latest stable)
- wasm-pack
- Node.js (for frontend dependencies)
-
Build WASM frontend:
cd scripts && ./build.sh
-
Build backend:
cd backend && cargo check && cargo build
-
Build for release:
cd backend && cargo build --release
The backend automatically detects and uses the best available acceleration:
This project is only tested with CUDA
# Build with CUDA support (NVIDIA GPUs)
cd backend && cargo build --release --features cuda
# Build with Metal support (Apple Silicon)
cd backend && cargo build --release --features metal
# Build with MKL support (Intel CPU optimization)
cd backend && cargo build --release --features mkl
# Build with Accelerate support (macOS CPU optimization)
cd backend && cargo build --release --features accelerateStart development server:
cd scripts && ./start.shOr manually:
cd scripts && ./build.sh && cd ../backend && cargo runThe server runs on http://localhost:8039 (or configured port) and serves:
- API endpoints under
/api/* - Static frontend files from
/backend/public/
--cpu: Force CPU usage instead of GPU acceleration--bind <ADDRESS>: Set bind address (default: 0.0.0.0:8039)
POST /api/tts- Generate speech from text- Form parameters:
text: Text to convert to speechdescription: Voice descriptiontemperature: Generation temperature (optional)seed: Random seed (optional)top_p: Top-p sampling parameter (optional)
- Form parameters:
GET /api/health- Health checkGET /api/debug- Debug endpoint
- Open your browser to
http://localhost:8039 - Enter text to convert to speech
- Provide a voice description (e.g., "A female speaker with clear, animated speech")
- Adjust generation parameters as needed:
- Temperature: Higher values (1.0+) for more creative/varied output, lower values (0.0-0.5) for more consistent output
- Seed: Set to a specific number for reproducible results
- Top P: Controls diversity of word choice (0.9 is a good default)
- Click "Generate Speech" to create and play audio
- axum - HTTP web framework
- candle - ML framework for running Parler-TTS model
- tokio - Async runtime
- tower-http - HTTP middleware (CORS, static files)
- wasm-bindgen - Rust/WASM/JS interop
- web-sys - Web API bindings for audio recording
- js-sys - JavaScript type bindings
- CPU: Any modern CPU (Intel/AMD/ARM)
- GPU (optional): NVIDIA GPU with CUDA support or Apple Silicon for acceleration
- RAM: Minimum 4GB, 8GB+ recommended for better performance
- Storage: ~2GB for model files (downloaded automatically)
MIT or APACHE
