This folder contains a complete bidirectional audio-to-audio communication system using Google's Gemini Live API. The system enables real-time voice conversations with AI through (smart glasses or) any web client.
- Backend: Python WebSocket server (
server.py) that connects to Gemini Live API - Frontend: JavaScript audio client (
audio-client.js) with HTML interface - Communication: Real-time bidirectional audio streaming via WebSocket
-
server.py- Main WebSocket server for Scholar AI assistant -
audio-client.js- Frontend audio client class for recording/playback -
index_for_server.html- Web interface for testing -
common.py- Shared utilities and base classes -
requirements.txt- Python dependencies (minimal, server.py focused) -
archive/- Contains alternative implementations and unused files
- Real-time audio recording and streaming
- Bidirectional communication (speak and listen)
- WebSocket-based architecture
- Browser-based client interface
- Python 3.8+
- Google API Key for Gemini Live API
-
Clone and navigate to project:
git clone <repository-url> cd audio-to-audio-architecture
-
Create virtual environment (recommended):
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
Note: Virtual environments (
venv/,.venv/) are excluded from git via.gitignoreas they're platform-specific and recreatable. -
Install dependencies:
pip install -r requirements.txt
-
Set API key:
export GOOGLE_API_KEY="your_api_key_here"
-
Run server:
python server.py
-
Open frontend: Open
index_for_server.htmlin your browser and start recording!google-chrome index_for_server.html
- Set your
GOOGLE_API_KEYenvironment variable - Install dependencies:
pip install -r requirements.txt - Run server:
python server.py - Open
index_for_server.htmlin browser - Start recording and speak with Scholar