[ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer
-
Updated
Nov 1, 2024 - Python
[ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer
PersonaPlex on Apple Silicon: an MLX port of NVIDIA’s full-duplex speech-to-speech model with realtime local/web modes and offline WAV inference.
An automated installation script for deploying Kyutai's Moshi STT server on macOS Apple Silicon.
SONATA (SOund and Narrative Advanced Transcription Assistant): An advanced ASR system that captures human expressions including emotive sounds and non-verbal cues.
Real-time, full-duplex AI voice bot integrating NVIDIA's PersonaPlex with Twilio Media Streams for natural speech-to-speech conversations.
Web Based Application For Managing Library Resources (Reading Resources Like Books and Articles)
Moshi: open-source speech-text foundation model for real-time full-duplex voice dialogue. Uses Mimi neural audio codec. PyTorch, MLX (Apple Silicon) and Rust backends. Moshika & Moshiko voices.
Voice to Image Prompts via OSC
🎤 Create real-time voice conversation bots with the PersonaPlex library and Twilio Media Streams for seamless, full-duplex communication.
Add a description, image, and links to the moshi topic page so that developers can more easily learn about it.
To associate your repository with the moshi topic, visit your repo's landing page and select "manage topics."