A real-time, local voice AI system optimized to run on an Ubuntu laptop with 8Gb Ram and no GPU, achieving less than 1 second STT to TTS latency. It leverages a WebSocket client/server architecture, utilizes Gemma3:1b via Ollama for the language model, Vosk for offline speech-to-text, and Piper for text-to-speech. The system also employs JACK/PipeWire for low-latency I/O. Inspired by advancements from AssemblyAI, ElevenLabs, Fixie, and LiveKit, this project aims for full localization. The current architecture does not support interruption logic.
Screenshot: Server.py (left) and Client.py (right)
- Low Latency: Sub-second end-to-end processing for speech-to-text (STT) and text-to-speech (TTS).
- Fully Local: Runs entirely offline with no cloud dependencies.
- Configurable: Adjustable via
.config.jsonfor model, voice, prompts, and audio settings. - Streaming Response: Handles streaming LLM responses and TTS for natural conversation flow.
- Audio Fading: Smooth audio transitions with configurable fade durations.
- History Support: Optional chat history retention for context-aware responses.
This project requires several external tools and libraries. You are responsible for installing and configuring them:
- Python 3.11: The core runtime.
- Ollama: For local LLM inference. Install from ollama.com and pull the model:
ollama pull gemma3:1b. - Vosk: For offline STT. Install via pip (
pip install vosk) and download a model (e.g., vosk-model-en-us-0.22) to thevosk-modeldirectory. - Piper: For TTS. Download from rhasspy/piper and update
PIPER_PATHinserver.pyto point to your installation. Place voice models (e.g.,ryan-medium.onnx) in thevoices/directory. - JACK Audio Server: For low-latency audio I/O. Install via
sudo apt install jackd2(on Ubuntu). - PipeWire: For JACK compatibility and audio routing. Install via
sudo apt install pipewire pipewire-jackand ensure it's running. - SoX: For audio processing. Install via
sudo apt install sox. - Python Packages: Listed in
requirements.txt. Install withpip install -r requirements.txt.
Additional setup:
- Ensure JACK is configured and running (e.g., via
jackdor PipeWire's JACK emulation). - Download Vosk models from alphacephei.com/vosk/models.
- Download Piper voices from rhasspy/piper voices.
-
Clone the repository:
git clone https://github.com/yourusername/Faster-Local-Voice-AI.git cd Faster-Local-Voice-AI -
Install Python dependencies:
pip install -r requirements.txt -
Set up directories:
- Create
vosk-model/and place your Vosk model files there. - Create
voices/and place your Piper voice models (e.g.,.onnxand.jsonfiles) there.
- Create
-
Install and configure the external dependencies as listed above.
-
Update paths in the code if needed (e.g.,
PIPER_PATHinserver.py).
-
Start the server (handles STT, LLM inference, and TTS):
python server.py- The server listens on
ws://0.0.0.0:8765. - It warms up the Ollama model on startup.
- The server listens on
-
In a separate terminal, start the client (handles audio I/O via JACK):
pw-jack python client.py- The client connects to the WebSocket server and syncs configuration.
- Speak into your microphone to interact; the AI will respond via audio output.
-
Interact:
- Say something (e.g., "Hello, how are you?").
- The system processes your speech, generates a response via Ollama, and speaks it back.
- Press Ctrl+C to exit.
Note: Ensure JACK/PipeWire is running and ports are connected (the client attempts auto-connection). The pw-jack prefix is used to run the client under PipeWire's JACK compatibility layer for optimal low-latency performance.
Copy the config.json file to .config.json.
Edit .config.json to customize the system:
{
"volume": 65, // System volume (0-100)
"mic_name": "", // Microphone device name (leave empty for default)
"audio_output_device": "", // Output device name (leave empty for default)
"model_name": "gemma3:1b", // Ollama model name
"voice": "ryan-medium.onnx", // Piper voice model file (in voices/)
"mute_mic_during_playback": true, // Mute mic while AI is speaking
"fade_duration_ms": 70, // Fade in/out duration in ms for smooth audio
"history_length": 0, // Number of previous messages to keep in context (0 for none)
"system_prompt": "You are a helpful voice assistant. Always give very short responses." // LLM system prompt
}server.py: WebSocket server for STT (Vosk), LLM (Ollama), and TTS (Piper) processing.client.py: Audio client using JACK for mic input and playback output.utils.py: Helper functions for config loading, voice sample rates, and audio fading..config.json: Configuration file (copy and edit as needed).requirements.txt: Python package requirements.vosk-model/: Directory for Vosk STT models (not included; download separately).voices/: Directory for Piper TTS voice models (not included; download separately).
- Audio Issues: Verify JACK/PipeWire setup with
pw-link -lorjack_lsp. Ensure ports are connected. - Model Not Found: Check paths for Vosk and Piper models.
- Latency Problems: Adjust
FLUSH_INTERVAL,MAX_RESPONSE_LENGTH, etc., inserver.pyfor your hardware. - Errors: Review console output for debug info (e.g., STDERR from Piper/Ollama).
Copyright (c) 2025 m15.ai
This project is licensed under the MIT License - see the LICENSE file for details. (Note: Create a LICENSE file with the full MIT text if not already present.)
Contributions are welcome! Open issues or pull requests for improvements, bug fixes, or new features.
Inspired by open-source AI tools and communities around local inference.
