RealtimeSwitch.com - Universal Voice API - v1
🚀 Universal switching between voice AI providers without breaking conversations
🔄 Seamless provider switching: OpenAI ↔ Gemini ↔ and more
🧠 Smart session persistence and conversation history preservation
⚡ Real-time WebSocket connections with HMAC authentication
📈 Performance monitoring and automatic failover switching
- Native API Compatibility: Out-of-the-box support for existing OpenAI Realtime API and Gemini Live API specifications - use any API style with our universal connection
- Provider Agnostic: Switch between OpenAI Realtime API, Gemini Live API, and future providers
- Session Persistence: Maintain conversation context across provider switches
- Real-time WebSockets: Low-latency bidirectional communication
- Secure Authentication: HMAC-based authentication with Firebase integration
- Event Transformation: Seamless translation between provider-specific event formats
- Performance Monitoring: Built-in metrics and automatic failover
- Modular Architecture: Plugin-based system for easy extensibility
Universal WebSocket API: Users connect to RealtimeSwitch through a single WebSocket endpoint with powerful flexibility:
-
API Style Selection: Choose your preferred client API specification via
rs_apiparameter:OPENAI- Use OpenAI Realtime API format for messagesGEMINI- Use Gemini Live API format for messages
-
Provider Selection: Specify which voice AI provider handles your requests via
rs_coreparameter:OPENAI- Route to OpenAI's Realtime APIGEMINI- Route to Google's Gemini Live API
-
Account-Based Authentication: Secure connections using account credentials:
- Each client has an
accountIdandsecretKey - Generate HMAC-SHA256 token using your unique
sessionId - Pass authentication via
rs_authparameter
- Each client has an
-
Session Persistence: RealtimeSwitch preserves full context across:
- Provider switches (OpenAI ↔ Gemini mid-conversation)
- Network interruptions (automatic reconnection with state recovery)
- Connection drops (resume conversations seamlessly)
For Local Development: Copy .env.example to .env, add your API keys, and start - no additional configuration needed!
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Client App │◄──►│ Orchestrator │◄──►│ Providers │
│ │ │ │ │ │
│ API Style: │ │ • Authentication │ │ • OpenAI │
│ • OpenAI Format │ │ • Session Mgmt │ │ • Gemini │
│ • Gemini Format │ │ • Event Transform│ │ • Future Provs │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌─────────────────┐
│ Core System │ │ Voice AI APIs │
│ • Event Pipeline │ │ • Real-time │
│ • Checkpointing │ │ • Streaming │
│ • State Mgmt │ │ • Function Call │
└──────────────────┘ └─────────────────┘
@realtime-switch/core- Core types and event management@realtime-switch/orchestrator- Main server and WebSocket handling@realtime-switch/providers-oai- OpenAI Realtime API integration@realtime-switch/providers-gemini- Gemini Live API integration@realtime-switch/transformers- Event format transformation@realtime-switch/extractors- Event extraction and parsing@realtime-switch/checkpoint- Session persistence and recovery@realtime-switch/switch- Provider switching logic@realtime-switch/accounts- Account management and authentication
- Node.js v18+ and npm v8+
- API Keys from:
- OpenAI Platform (for OpenAI Realtime API)
- Google AI Studio (for Gemini Live API)
npm run install:allnpm run buildCreate a .env file inside the orchestrator package:
cd orchestrator
cp .env.example .envEdit .env and replace placeholder values with your actual API keys:
# Required: Add your API keys
OPENAI_API_KEY=sk-proj-your_actual_openai_key_here
GEMINI_API_KEY=your_actual_gemini_key_here
# Default test account (works as-is for local testing)
ACCOUNTS=996-sdassds-86-asd=secretkey-996npm startThe server will be available at the configured HOST and PORT from your .env (default: http://localhost:3000)
Test the connection using the web-based test client:
# In a new terminal (make sure RealtimeSwitch server is running first And
# you are in repo root before running below cmd)
cd orchestrator
npm run test:clientThis will:
- Start a test web server (reads your
.envconfiguration) - Automatically open your browser to the test client
- Click Start and allow cam and mic permissions and start talking.
- Explore the index.html code inside orchestrator/test package to see realtime ws api usage
Alternative CLI Test:
node orchestrator/test/auth-test-client.jsYou should see: ✅ WebSocket connected successfully!
The .env file in the orchestrator package supports the following configuration:
# Server Configuration
PORT=3000
HOST=localhost
# Provider API Keys (Required)
OPENAI_API_KEY=your_openai_key_here # Used when voice provider is set to OpenAI Realtime API
GEMINI_API_KEY=your_gemini_key_here # Used when voice provider is set to Gemini Live API
# Account Authentication (for local development)
ACCOUNTS=your-account-id=your-secret-key # Default HMAC key used for WebSocket authentication
# Database Configuration (Optional)
ENABLE_DB=false # Enable database for dynamic account management (default: false)
# Note: Local development doesn't need this, but enable if you want dynamic account management using a database
# Provider Switching Configuration
SWITCH_LATENCY_THRESHOLD_MS=500
SWITCH_FAILURE_COUNT=3- OPENAI_API_KEY: Used when a client connects with
rs_core=OPENAIor when the system routes voice processing through OpenAI's Realtime API - GEMINI_API_KEY: Used when a client connects with
rs_core=GEMINIor when the system routes voice processing through Google's Gemini Live API
How WebSocket Authentication Works:
- Account Setup: Each account has an ID and secret key (stored in
ACCOUNTSenv var) - Client-side: Generate HMAC-SHA256 hash using:
HMAC(sessionId, secretKey) - Connection: Include auth parameters in WebSocket URL query string
- Server-side: Validates the HMAC signature before allowing connection
ACCOUNTS Key (Local Development):
- Format:
accountId=secretKey(comma-separated for multiple accounts) - Example:
ACCOUNTS=996-sdassds-86-asd=secretkey-996 - Used for HMAC-SHA256 signature validation of WebSocket connections
- This default key setup allows quick local testing without database setup
Required WebSocket URL Parameters:
rs_accid: Account ID (e.g.,996-sdassds-86-asd)rs_u_sessid: Session ID (e.g.,session-123)rs_auth: HMAC-SHA256 hash of sessionId using account's secret keyrs_api: Client API format (OPENAIorGEMINI)rs_core: Provider to use (OPENAIorGEMINI)
Step-by-Step HMAC Generation:
// Using the default account from .env
const crypto = require('crypto');
const accountId = '996-sdassds-86-asd';
const secretKey = 'secretkey-996'; // From ACCOUNTS env var
const sessionId = 'test-session-browser'; // Your session identifier
// Generate HMAC-SHA256 signature
const authHash = crypto.createHmac('sha256', secretKey).update(sessionId).digest('hex');
// Result: '905b80fa2656d8f34c00e077260fc8972502591c384c9413228dbcb19dc7e041'
// Connect with all required parameters
const ws = new WebSocket(`ws://localhost:3000?rs_accid=${accountId}&rs_u_sessid=${sessionId}&rs_auth=${authHash}&rs_api=OPENAI&rs_core=GEMINI`);Test HMAC Generation: For testing and validation, use the included test clients:
# Web-based test client (recommended)
cd orchestrator && npm run test:client
# Or Node.js CLI test client
node orchestrator/test/auth-test-client.jsThese generate the correct HMAC hash (905b80fa2656d8f34c00e077260fc8972502591c384c9413228dbcb19dc7e041) and test the connection with the default account.
ENABLE_DB Configuration:
ENABLE_DB=false(Default): Uses onlyACCOUNTSfrom .env for authentication - perfect for local development and testingENABLE_DB=true: Enables database integration for dynamic account management - ideal for production environments
When ENABLE_DB=true:
Firestore Integration:
- Provides dynamic account creation and management
- Stores account keys securely with Firebase authentication integration
- Requires
FIRESTORE_PROJECT_IDandGOOGLE_APPLICATION_CREDENTIALSin your .env
Extensible Architecture:
- The
AccountManagerinterface can be extended for any database - Implement your own:
RedisAccountManager,PostgreSQLAccountManager, etc. - Simply swap the implementation in
Server.ts:const accountManager = new YourCustomAccountManager();
Authentication Fallback System:
- First: Checks
ACCOUNTSfrom .env configuration - Then: Falls back to database lookup (if
ENABLE_DB=true) - Result:
.envaccounts work immediately, database provides additional accounts when needed
// Using the default account from .env.example
const ws = new WebSocket('ws://localhost:3000?rs_accid=996-sdassds-86-asd&rs_u_sessid=test-session-browser&rs_auth=905b80fa2656d8f34c00e077260fc8972502591c384c9413228dbcb19dc7e041&rs_api=OPENAI&rs_core=GEMINI');// ⚠️ SECURITY NOTICE: This is for local testing only
// In production, your server generates the authHash and sends it to the client
const crypto = require('crypto');
const accountId = '996-sdassds-86-asd';
const sessionId = 'test-session-browser';
const secretKey = 'secretkey-996'; // ⚠️ NEVER expose this in client code
const authHash = crypto.createHmac('sha256', secretKey).update(sessionId).digest('hex');
// authHash will be: '905b80fa2656d8f34c00e077260fc8972502591c384c9413228dbcb19dc7e041'
// Connect to RealtimeSwitch with OpenAI API style and Gemini provider
const ws = new WebSocket(`ws://localhost:3000?rs_accid=${accountId}&rs_u_sessid=${sessionId}&rs_auth=${authHash}&rs_api=OPENAI&rs_core=GEMINI`);
ws.on('open', () => {
console.log('Connected to RealtimeSwitch');
// Send voice events in OpenAI format - they'll be transformed to Gemini automatically
});
ws.on('message', (data) => {
// Receive responses in OpenAI format - transformed from Gemini responses
console.log('Received:', JSON.parse(data.toString()));
});Production Implementation:
// Secure client implementation - get authHash from your backend
const accountId = 'your-account-id';
const sessionId = 'unique-session-id';
// Call your backend API to get the HMAC hash
const response = await fetch('/api/auth/websocket-token', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ accountId, sessionId })
});
const { authHash } = await response.json();
// Use the server-generated hash to connect
const ws = new WebSocket(`ws://localhost:3000?rs_accid=${accountId}&rs_u_sessid=${sessionId}&rs_auth=${authHash}&rs_api=OPENAI&rs_core=GEMINI`);Use native Gemini Live API format for both sending and receiving messages:
// Connect with Gemini API style and Gemini provider
const ws = new WebSocket(`ws://localhost:3000?rs_accid=${accountId}&rs_u_sessid=${sessionId}&rs_auth=${authHash}&rs_api=GEMINI&rs_core=GEMINI`);
ws.on('open', () => {
// Start session using Gemini Live API format
ws.send(JSON.stringify({
setup: {
model: "models/gemini-2.0-flash-exp",
generation_config: {
response_modalities: ["AUDIO"],
speech_config: {
voice_config: { prebuilt_voice_config: { voice_name: "Aoede" } }
}
}
}
}));
});
// Send audio using Gemini format
const sendAudio = (audioData) => {
ws.send(JSON.stringify({
realtime_input: {
media_chunks: [{
mime_type: "audio/pcm",
data: audioData // Base64 encoded audio
}]
}
}));
};
// Receive responses in Gemini format
ws.on('message', (data) => {
const response = JSON.parse(data.toString());
if (response.serverContent?.modelTurn?.parts) {
console.log('Gemini response:', response.serverContent.modelTurn.parts);
}
if (response.serverContent?.outputTranscription) {
console.log('Gemini transcription:', response.serverContent.outputTranscription.text);
}
});Use native OpenAI Realtime API format for both sending and receiving messages:
// Connect with OpenAI API style and OpenAI provider
const ws = new WebSocket(`ws://localhost:3000?rs_accid=${accountId}&rs_u_sessid=${sessionId}&rs_auth=${authHash}&rs_api=OPENAI&rs_core=OPENAI`);
ws.on('open', () => {
// Configure session using OpenAI Realtime API format
ws.send(JSON.stringify({
type: 'session.update',
session: {
modalities: ['text', 'audio'],
voice: 'alloy',
input_audio_format: 'pcm16',
output_audio_format: 'pcm16',
turn_detection: { type: 'server_vad' }
}
}));
});
// Send audio using OpenAI format
const sendAudio = (audioData) => {
ws.send(JSON.stringify({
type: 'input_audio_buffer.append',
audio: audioData // Base64 encoded PCM16 audio
}));
};
// Receive responses in OpenAI format
ws.on('message', (data) => {
const event = JSON.parse(data.toString());
// Handle different OpenAI event types
switch (event.type) {
case 'session.created':
case 'session.updated':
console.log('Session:', event.session);
break;
case 'response.audio.delta':
console.log('Audio chunk:', event.delta);
break;
case 'response.audio_transcript.delta':
console.log('Transcript:', event.delta);
break;
case 'conversation.item.input_audio_transcription.completed':
console.log('Input transcribed:', event.transcript);
break;
case 'response.function_call_arguments.delta':
console.log('Function args:', event.delta);
break;
case 'response.done':
console.log('Response completed:', event.response.status);
break;
case 'input_audio_buffer.speech_started':
console.log('Speech started');
break;
case 'input_audio_buffer.speech_stopped':
console.log('Speech stopped');
break;
}
});Use OpenAI API format with Gemini provider backend - RealtimeSwitch handles the translation:
// Connect with OpenAI API style but Gemini provider backend
const ws = new WebSocket(`ws://localhost:3000?rs_accid=${accountId}&rs_u_sessid=${sessionId}&rs_auth=${authHash}&rs_api=OPENAI&rs_core=GEMINI`);
ws.on('open', () => {
// Send OpenAI format - will be translated to Gemini internally
ws.send(JSON.stringify({
type: 'session.update',
session: {
modalities: ['text', 'audio'],
voice: 'alloy', // Mapped to Gemini voice
input_audio_format: 'pcm16',
output_audio_format: 'pcm16'
}
}));
});
// Send in OpenAI format, processed by Gemini
const sendAudio = (audioData) => {
ws.send(JSON.stringify({
type: 'input_audio_buffer.append',
audio: audioData
}));
};
// Receive OpenAI format responses (translated from Gemini)
ws.on('message', (data) => {
const event = JSON.parse(data.toString());
// Responses come back in OpenAI format even though Gemini processes them
if (event.type === 'response.audio.delta') {
console.log('Translated audio from Gemini:', event.delta);
}
if (event.type === 'response.audio_transcript.delta') {
console.log('Translated transcript from Gemini:', event.delta);
}
});For Development (with hot reload):
npm run devWe are looking for early adopters and developers who can help us test this API. If interested, please join our early adopters program at realtimeswitch.com.
MIT License - see LICENSE for details.
- Apurav Chauhan
- Indu Chauhan
