LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
-
Updated
May 19, 2025 - Python
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
A simple, high-quality voice conversion tool focused on ease of use and performance.
Realtime AI Voice Agents with SoTA AI models like OpenAI Realtime, Gemini Live, Grok, Eleven Labs on Arduino ESP32 with Secure Websockets and Deno with >15 minutes uninterrupted conversations globally for AI toys, AI companions, AI devices and more
A lightning-fast, cross-platform AI Assistant App built with React Native.
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
A desktop application that uses AI to translate voice between languages in real time, while preserving the speaker's tone and emotion.
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Speech-to-speech AI assistant with natural conversation flow, mid-speech interruption, vision capabilities and AI-initiated follow-ups. Features low-latency audio streaming, dynamic visual feedback, and works with local LLM/TTS services via OpenAI-compatible endpoints.
A real-time speech-to-speech chatbot powered by Whisper Small, Llama 3.2, and Kokoro-82M.
Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测,知己知彼。
This is an on-CPU real-time conversational system for two-way speech communication with AI models, utilizing a continuous streaming architecture for fluid conversations with immediate responses and natural interruption handling.
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not limited to end-to-end speech interaction, end-to-end speech translation and speech recognition.
Samantha OS1 is a conversational AI assistant powered by the Realtime API from OpenAI
FreeSWITCH module to stream audio to websocket and receive response
X-Talk is an open-source full-duplex cascaded spoken dialogue system framework enabling low-latency, interruptible, and human-like speech interaction with a lightweight, pure-Python, production-ready architecture.
Cascading voice assistant combining real-time speech recognition, AI reasoning, and neural text-to-speech capabilities.
MOSS-Speech is a true speech-to-speech large language model without text guidance.
🗣️ Real‑time, low‑latency voice, vision, and conversational‑memory AI assistant built on LiveKit and local LLMs ✨
svelte component for using the openai realtime api
Add a description, image, and links to the speech-to-speech topic page so that developers can more easily learn about it.
To associate your repository with the speech-to-speech topic, visit your repo's landing page and select "manage topics."