Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
-
Updated
Sep 17, 2025 - Python
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
AudioLDM: Generate speech, sound effects, music and beyond, with text.
Audio generation using diffusion models, in PyTorch.
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
A fundamental toolkit designed for music, song, and audio generation
A family of diffusion models for text-to-audio generation.
Official PyTorch implementation of BigVGAN (ICLR 2023)
Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale text processing. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and CPU.
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
Self-host the powerful Dia TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), support for SafeTensors/BF16, voice cloning, dialogue generation, and GPU/CPU execution.
Python library for designing and training your own Diffusion Models with PyTorch
Official implementation of "JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization"
A ComfyUI custom node integration for multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, IndexTTS-2, Chatterbox (classic and multilingual 23-lang), F5-TTS, Higgs Audio 2 and Microsoft VibeVoice with unlimited text length, SRT timing, Character support, Audio Analyzer, Silent Speech Analyzer, audio edit and more
Pytorch implementation of BigVSAN
Self-host the ultra-lightweight Kitten TTS model with this enhanced API server with an intuitive Web UI, large text processing for audiobooks, and GPU acceleration.
The AI Podcast Studio: generate podcasts scripts and their audio version with a team of AI workers in a Podcast Studio 🎙️📜
Add a description, image, and links to the audio-generation topic page so that developers can more easily learn about it.
To associate your repository with the audio-generation topic, visit your repo's landing page and select "manage topics."