Lists (1)
Sort Name ascending (A-Z)
Stars
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
VoiceBench: Benchmarking LLM-Based Voice Assistants
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊
An easy-to-use, fast, and easily integrable tool for evaluating audio LLM
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Tensors and Dynamic neural networks in Python with strong GPU acceleration
🤖 Build voice-based LLM agents. Modular + open source.
Azure OpenAI code resources for using gpt-4o-realtime capabilities.
✨✨Latest Advances on Multimodal Large Language Models
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
React app for inspecting, building and debugging with the Realtime API
🧰 The AutoTokenizer that TikToken always needed -- Load any tokenizer with TikToken now! ✨
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Node.js + JavaScript reference client for the Realtime API (beta)
Open Source framework for voice and multimodal conversational AI
Awesome speech/audio LLMs, representation learning, and codec models
DSPy: The framework for programming—not prompting—language models
Domain Specific Language for the Abstraction and Reasoning Corpus