Flex on cloud-paid-APIs aproche with a flexible modular, easly upgradeable voice chat framework by swaps VAD, STT, LLM, and TTS in local pipeline for real-time voice assistants.
π₯ Fully local real-time voice chat via modular pipeline WebRTC>VAD>SST>LLM>TTS in Gradio interface:
def process_pipeline(audio, conversation):
# 1. Voice Activity Detection: Audio array -> `check_vad()` -> Boolean (speech detected)
# 2. Speech-to-Text: Audio array -> `load_stt_models` and `transcribe_with_whisper(audio_array)` -> Transcribed text
# 3. LLM: Text prompt -> `send_to_llm` -> Text response
# 4. Text-to-Speech: Text -> `initialize_tts`, `text_to_speech(text)` -> (audio_array, sample_rate)flowchart TD
A(["`π€ Start: Receive Audio Input (WebRTC)`"]) --> B
B["`π Voice Activity Detection (VAD): Silero VAD (check_vad)`"] --> C{"`β Speech Detected?`"}
C -- Yes --> D["`π£οΈ Speech-to-Text (STT): Whisper (transcribe_with_whisper)`"]
D --> E["`π€ Language Model (LLM): OpenAI API (send_to_llm)`"]
E --> F["`π Text-to-Speech (TTS): Kokoro (text_to_speech)`"]
F --> G(["`π§ End: Output Audio Response (Gradio Interface)`"])
C -- No --> H(["`π« End: No Action`"])
class A,G,H output
class C decision
class B,D,E,F process
- π€ Voice Activity Detection: (Silero VAD)
- π£οΈ Speech-to-Text insanely-fast-whisper
- π€ Language Model (OpenAI API format, local server)
- π Text-to-Speech (Kokoro)
- π€ Contributing: Easy to modify - share your implementations!
# π Requirements: Python>=3.8+, cu118+ (CUDA Toolkit for Whisper) and server openai-API (default: ooba webui)
# 1. π§ Clone & Install dependencies:
git clone https://github.com/Katehuuh/FlexVoice.git && cd FlexVoice
python -m venv venv && venv\Scripts\activate
pip install -r requirements.txt
# 2. Run! Opens http://localhost:7861
python app.py