Lists (1)
Sort Name ascending (A-Z)
Stars
An easy-to-use, fast, and easily integrable tool for evaluating audio LLM
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Tensors and Dynamic neural networks in Python with strong GPU acceleration
🤖 Build voice-based LLM agents. Modular + open source.
Azure OpenAI code resources for using gpt-4o-realtime capabilities.
✨✨Latest Advances on Multimodal Large Language Models
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
React app for inspecting, building and debugging with the Realtime API
🧰 The AutoTokenizer that TikToken always needed -- Load any tokenizer with TikToken now! ✨
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Node.js + JavaScript reference client for the Realtime API (beta)
Open Source framework for voice and multimodal conversational AI
Awesome speech/audio LLMs, representation learning, and codec models
DSPy: The framework for programming—not prompting—language models
Domain Specific Language for the Abstraction and Reasoning Corpus
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Tools for working with the Abstraction & Reasoning Corpus
Code for 1st place solution to Kaggle's Abstraction and Reasoning Challenge
Inference and training library for high-quality TTS models.
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production