VoceVibe – Real-Time Audio-to-Visual Performance Application

Overview

VoceVibe is a real-time speech-to-text (STT) application designed for generative art performance. It acts as a "cognitive bridge" that transforms spoken audio into structured visual prompts in real-time.

It utilizes Kyutai's Dedicated STT 1B model (running on PyTorch CPU for maximum stability on macOS), processes transcripts with a local Large Language Model (Mistral NeMo via Ollama), and sends engineered visual prompts via OSC (Open Sound Control) to external rendering engines like TouchDesigner, Stable Diffusion, or Flux.

⚠️ Hardware Requirement: This project is developed and optimized for macOS Apple Silicon (M1/M2/M3). While it uses the CPU for the STT model to ensure stability with specific PyTorch operators, the architecture is designed for the unified memory bandwidth of Mac chips.

🚀 Key Features

Real-Time Bilingual STT: Powered by kyutai/stt-1b-en_fr (Dedicated STT model) running on PyTorch. Handles switching between French and English fluidly.
Hallucination-Free Architecture: Uses a dedicated STT model (not a conversational one) with deterministic decoding (temp=0.0) to prevent the AI from "inventing" dialogue.
"Dual-Brain" Intelligence:
- ⚡️ Fast Lane (BrainEngine): Generates instant, artistic visual prompts (SDXL-optimized) every few seconds based on immediate context.
- 🐢 Slow Lane (SummaryEngine): Accumulates the full conversation history to generate structured diagrams, mind maps, or summaries every minute.
Robust Audio Pipeline: Includes Automatic Gain Control (AGC) and strict Noise Gating to ensure only clear voice data reaches the model.
OSC Integration: Sends raw strings to /visual/prompt (for generative art) and /visual/summary (for archives/structure).
Cyberpunk UI: A dark-mode customtkinter interface providing real-time monitoring of audio levels, transcriptions, and generated prompts.

📋 Prerequisites

macOS (Apple Silicon M1/M2/M3 recommended).
Python 3.10+.
Ollama installed and running. You must pull the required LLM model before starting:
```
ollama pull mistral-nemo
```

🛠️ Installation

Clone the repository

git clone [https://github.com/Studio-Carlos/VoceVibe.git](https://github.com/Studio-Carlos/VoceVibe.git)
cd VoceVibe

Create a virtual environment (Recommended)

python -m venv .venv
source .venv/bin/activate

Install dependencies This project requires specific versions of PyTorch to maintain compatibility with the Moshi/Kyutai loader.
```
pip install -r requirements.txt
```
Download STT Models The application handles model downloading automatically via HuggingFace Hub upon the first launch. Ensure you have an internet connection for the first run (~2GB download).

🎮 Usage

Start the Application
```
python main.py
```
Configuration (In-App)
- Audio Input: Select your microphone or virtual cable (e.g., BlackHole) from the dropdown.
- OSC Target: Set the IP and Port of your visualizer (default: 127.0.0.1:8000).
- History Window: Adjust the slider to control how much context the "Fast Brain" takes into account.
Perform
- Click START.
- Speak into the microphone.
- Monitor the STT (blue), Fast Prompts (pink), and Summaries (orange) in the logs.

🏗️ Architecture

The application runs on a multi-threaded architecture to ensure the UI never freezes:

src/audio_engine.py: Handles audio capture (sounddevice) and transcription (PyTorch). Uses a Producer/Consumer pattern with a thread-safe queue.
src/brain_engine.py (Fast Brain): Consumes transcripts, maintains a sliding window of context, and prompts Ollama for SDXL visual descriptions.
src/summary_engine.py (Slow Brain): Accumulates the entire session transcript and triggers high-level summaries or diagram prompts at longer intervals.
src/osc_client.py: Handles network communication.
src/config.py: Centralized configuration and System Prompts.

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines on how to propose features or fix bugs.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
src		src
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
OPEN_SOURCE_CHECKLIST.md		OPEN_SOURCE_CHECKLIST.md
README.md		README.md
create_launcher.py		create_launcher.py
install.sh		install.sh
main.py		main.py
manual.md		manual.md
monitor.sh		monitor.sh
requirements.txt		requirements.txt
start.sh		start.sh
stop.sh		stop.sh
user_settings.json		user_settings.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoceVibe – Real-Time Audio-to-Visual Performance Application

Overview

🚀 Key Features

📋 Prerequisites

🛠️ Installation

🎮 Usage

🏗️ Architecture

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Studio-Carlos/VoceVibe

Folders and files

Latest commit

History

Repository files navigation

VoceVibe – Real-Time Audio-to-Visual Performance Application

Overview

🚀 Key Features

📋 Prerequisites

🛠️ Installation

🎮 Usage

🏗️ Architecture

🤝 Contributing

📄 License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages