Skip to content
View gorinars's full-sized avatar
🛠️
🛠️

Block or report gorinars

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Audio Large Language Models

Python 350 21 Updated Jan 15, 2025

Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.

649 38 Updated Aug 3, 2024

VoiceBench: Benchmarking LLM-Based Voice Assistants

Python 107 6 Updated Feb 5, 2025

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 2,950 276 Updated Feb 5, 2025

A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.

Python 317 18 Updated Jan 14, 2025

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 476 26 Updated Nov 19, 2024

Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊

260 7 Updated Jan 27, 2025
Python 19 1 Updated Jan 10, 2025

An easy-to-use, fast, and easily integrable tool for evaluating audio LLM

Python 28 Updated Jan 24, 2025

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

C 4,579 948 Updated Jan 31, 2025

TTS with kokoro and onnx runtime

Python 1,432 127 Updated Feb 5, 2025

A Survey of Spoken Dialogue Models (60 pages)

257 16 Updated Nov 28, 2024

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 39,841 4,470 Updated Jan 18, 2025

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 86,517 23,286 Updated Feb 6, 2025

🤖 Build voice-based LLM agents. Modular + open source.

Python 3,125 520 Updated Nov 15, 2024

A fast multimodal LLM for real-time voice

Python 3,396 228 Updated Jan 31, 2025

Azure OpenAI code resources for using gpt-4o-realtime capabilities.

TypeScript 748 146 Updated Jan 22, 2025

✨✨Latest Advances on Multimodal Large Language Models

13,740 885 Updated Jan 28, 2025

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,024 149 Updated Jan 21, 2025

React app for inspecting, building and debugging with the Realtime API

JavaScript 2,847 1,035 Updated Feb 1, 2025

🧰 The AutoTokenizer that TikToken always needed -- Load any tokenizer with TikToken now! ✨

Python 35 3 Updated Jan 3, 2025

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Python 265 16 Updated Jan 2, 2025

Node.js + JavaScript reference client for the Realtime API (beta)

JavaScript 851 243 Updated Nov 7, 2024

Open Source framework for voice and multimodal conversational AI

Python 4,544 504 Updated Feb 5, 2025
Python 7,348 582 Updated Feb 5, 2025

Awesome speech/audio LLMs, representation learning, and codec models

872 57 Updated Feb 5, 2025

DSPy: The framework for programming—not prompting—language models

Python 21,666 1,639 Updated Feb 5, 2025

Domain Specific Language for the Abstraction and Reasoning Corpus

Python 233 48 Updated Oct 11, 2024
Next