A production-ready conversational AI telephony system built on Vocode, integrating Deepgram, OpenAI, and ElevenLabs to deliver natural voice interactions for inbound and outbound phone calls.
This AI-powered telephony solution enables businesses to deploy intelligent voice agents for various use cases including customer support, appointment scheduling, lead qualification, and information collection. The system provides seamless integration with existing telephony infrastructure through Twilio's robust platform.
- Real-time Voice Processing: Advanced speech-to-text and text-to-speech capabilities
- Intelligent Conversation Management: OpenAI-powered natural language understanding
- Scalable Architecture: Docker-based deployment with Kubernetes support
- Comprehensive Monitoring: Built-in Prometheus metrics and observability
- Flexible Configuration: Customizable agents, transcribers, and synthesizers
The system leverages the following technology stack:
- Voice Processing: Deepgram for speech transcription, ElevenLabs for synthesis
- AI Engine: OpenAI for conversational intelligence
- Telephony: Twilio for call management and routing
- Infrastructure: FastAPI, Redis, Docker, Kubernetes
- Monitoring: Prometheus metrics collection
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ AI TELEPHONY VOICE BOT WORKFLOW │
└─────────────────────────────────────────────────────────────────────────────────────────┘
👤 USER (Phone Call)
│
▼
┌──────────────────────────────────────────────────────────────────────────────────────────┐
│ 🌐 TWILIO TELEPHONY SERVICE │
│ ┌─────────────────┬─────────────────┐ │
│ │ Inbound Call │ Outbound Call │ │
│ │ Webhook │ Initiation │ │
└──────────────────────────┼─────────────────┼─────────────────┼──────────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ 📱 VOICE BOT APPLICATION CORE │
│ │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ 📞 INBOUND HANDLER │ │ 📱 OUTBOUND HANDLER │ │
│ │ (inbound.py) │ │ (outbound_call.py) │ │
│ └──────────┬──────────┘ └──────────┬──────────┘ │
│ │ │ │
│ └──────────────────────┐ ┌────────────────────┘ │
│ ▼ ▼ │
│ ┌─────────────────────────────┐ │
│ │ 🎯 VOICE BOT EVENTS MANAGER │ │
│ │ (VoiceBotEventsManager) │ │
│ │ │ │
│ │ Event Types Handled: │ │
│ │ • TRANSCRIPT │ │
│ │ • TRANSCRIPT_COMPLETE │ │
│ │ • PHONE_CALL_CONNECTED │ │
│ │ • PHONE_CALL_ENDED │ │
│ │ • RECORDING │ │
│ │ • ACTION │ │
│ └─────────────┬───────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ 🏭 AGENT FACTORY & CONFIGURATION │ │
│ │ │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │
│ │ │ 📝 PROMPT │ │ 🏭 AGENT │ │ 🗄️ REDIS CONFIG │ │ │
│ │ │ HANDLER │ │ FACTORY │ │ MANAGER │ │ │
│ │ │ (system_prompt) │ │ (SpellerAgent │ │ (Session Store) │ │ │
│ │ └─────────────────┘ │ Factory) │ └─────────────────┘ │ │
│ │ └─────────┬───────┘ │ │
│ │ │ │ │
│ │ ┌──────────────┼──────────────┐ │ │
│ │ ▼ ▼ ▼ │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ │ │
│ │ │ 🤖 CHATGPT │ │ 🔤 SPELLER │ │ │
│ │ │ AGENT │ │ AGENT │ │ │
│ │ │ (OpenAI) │ │ (Character │ │ │
│ │ │ │ │ Spelling) │ │ │
│ │ └─────────────────┘ └─────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ 🔄 REAL-TIME PROCESSING PIPELINE │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ 🎤 │ │ 📝 │ │ 🤖 │ │ 🔊 │ │
│ │ DEEPGRAM │───▶│ EVENTS │───▶│ AGENT │───▶│ ELEVENLABS │ │
│ │ (Speech-to- │ │ MANAGER │ │ PROCESSING │ │(Text-to- │ │
│ │ Text) │ │(Transcript) │ │(AI Response)│ │ Speech) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ▲ │ │ │
│ │ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────────────────────────┐ │
│ │ 🌐 TWILIO │◀───────────────────│ 🎯 EVENTS MANAGER │ │
│ │ (Audio │ │ (Response Routing) │ │
│ │ Streaming) │ └─────────────────────────────────┘ │
│ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ 📊 MONITORING & ANALYTICS LAYER │
│ │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ 📈 PROMETHEUS │ │ 📋 EVENT LOGGING │ │
│ │ METRICS │ │ & ANALYTICS │ │
│ │ │ │ │ │
│ │ • Session Counter │ │ • Call Duration │ │
│ │ • Active Sessions │ │ • Transcript Logs │ │
│ │ • Call Analytics │ │ • Error Tracking │ │
│ │ │ │ • Performance Data │ │
│ └─────────────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ 🔧 EXTERNAL SERVICES │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ 🌐 TWILIO │ │ 🎤 DEEPGRAM │ │ 🧠 OPENAI │ │ 🔊 ELEVEN │ │
│ │ Telephony │ │ Speech-to- │ │ Language │ │ LABS │ │
│ │ Platform │ │ Text API │ │ Model API │ │ Text-to- │ │
│ │ │ │ │ │ │ │ Speech API │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────────────────────────┘
📊 DATA FLOW LEGEND
═══════════════════════════
───▶ Audio/Voice Stream
═══▶ Text/Data Processing
┄┄▶ Configuration/Control
▲▼ Bidirectional Communication
User ──(Phone Call)──▶ Twilio ──(Webhook/API)──▶ Voice Bot Application
- Inbound: User dials Twilio number → Webhook triggers
inbound.py
- Outbound: Application calls
outbound_call.py
→ Twilio connects to user
Call Handler ──▶ VoiceBotEventsManager ──▶ Agent Factory ──▶ Selected Agent
- Events Manager handles all call lifecycle events
- Session metrics tracked via Prometheus
- Agent Factory selects appropriate AI agent based on configuration
User Audio ──▶ Deepgram ──▶ Text ──▶ AI Agent ──▶ Response ──▶ ElevenLabs ──▶ Audio ──▶ User
- Step 1: Deepgram converts speech to text in real-time
- Step 2: Selected agent (ChatGPT/Speller) processes the text
- Step 3: ElevenLabs synthesizes AI response back to audio
- Step 4: Twilio streams audio response to user
- 🤖 ChatGPT Agent: Uses OpenAI API for intelligent conversations
- 🔤 Speller Agent: Simple character-by-character spelling functionality
- 📝 Prompt Handler: Loads system prompts from
system_prompt.md
TRANSCRIPT
: Real-time speech transcription updatesPHONE_CALL_CONNECTED
: Session initialization and metrics incrementPHONE_CALL_ENDED
: Session cleanup and metrics decrementRECORDING
: Call recording management and storageACTION
: Custom action execution and logging
- 📈 Prometheus Metrics: Session counters, active call tracking
- 📋 Event Logging: Comprehensive call analytics and debugging
- 🔍 Real-time Monitoring: Performance tracking and error detection
- Docker and Docker Compose
- Python 3.8+ with Poetry (for development)
- Deepgram: Speech transcription services
- OpenAI: Language model and conversation management
- ElevenLabs: Text-to-speech synthesis
- Twilio: Telephony services and phone number management
- Ngrok or Cloudflare Tunnels for local testing
- Kubernetes cluster for production deployment
- Helm 3.0+ for Kubernetes deployments
cp .env.template .env
Configure your .env
file with the required API keys:
# Telephony Configuration
BASE_URI=your-domain.com
TWILIO_ACCOUNT_SID=your_twilio_sid
TWILIO_AUTH_TOKEN=your_twilio_token
FROM_PHONE=+1234567890
TO_PHONE=+1234567890
# AI Services
DEEPGRAM_API_KEY=your_deepgram_key
OPENAI_API_KEY=your_openai_key
OPENAI_BASE_URL=https://api.openai.com/v1
ELEVEN_LABS_API_KEY=your_eleven_labs_key
ELEVEN_LABS_VOICE_ID=your_voice_id
For local testing, expose your development server:
ngrok http 6000
Update your .env
with the ngrok URL (without https://
):
BASE_URI=abc123.ngrok.app
docker-compose up --build
The application will be available at http://localhost:6000
-
Acquire Phone Number
- Log into your Twilio Console
- Navigate to Phone Numbers → Manage → Buy a number
- Purchase a phone number for your region
-
Configure Webhook
- Go to Phone Numbers → Manage → Active Numbers
- Select your purchased number
- Set Webhook URL to:
https://your-domain.com/inbound_call
- Set HTTP method to
POST
- Save configuration
Ensure your environment variables are configured, then execute:
poetry install
poetry run python outbound_call.py
The system exposes Prometheus metrics on port 8000:
voicebot_session_count
: Total number of sessions initiatedvoicebot_active_sessions
: Current active session count
Access metrics at: http://localhost:8000/metrics
- Kubernetes cluster (EKS, GKE, AKS, or self-managed)
- Helm 3.0+
- kubectl configured for your cluster
- Clone Repository
git clone https://github.com/danieladdisonorg/AI-Agent-for-Telephony-voice-bot
cd AI-Agent-for-Telephony-voice-bot
- Create Configuration
Create production-values.yaml
:
env:
BASE_URI: "your-production-domain.com"
DEEPGRAM_API_KEY: "your-deepgram-key"
OPENAI_API_KEY: "your-openai-key"
OPENAI_BASE_URL: "https://api.openai.com/v1"
ELEVEN_LABS_API_KEY: "your-eleven-labs-key"
ELEVEN_LABS_VOICE_ID: "your-voice-id"
TWILIO_ACCOUNT_SID: "your-twilio-sid"
TWILIO_AUTH_TOKEN: "your-twilio-token"
FROM_PHONE: "your-twilio-number"
TO_PHONE: "your-target-number"
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 500m
memory: 512Mi
replicas: 3
- Deploy Application
helm install voice-bot ./kubernetes/voice-bot-helm/voice-bot -f production-values.yaml
- Verify Deployment
kubectl get pods -l app=voice-bot
kubectl get services voice-bot
Update Deployment:
helm upgrade voice-bot ./kubernetes/voice-bot-helm/voice-bot -f production-values.yaml
Remove Deployment:
helm uninstall voice-bot
View Logs:
kubectl logs -f deployment/voice-bot
Customize conversation behavior through AgentConfig
parameters in your deployment configuration.
ChatGPTAgentConfig(
initial_message=BaseMessage(text="Hello! How can I help you today?"),
prompt_preamble="You are a helpful customer service representative...",
generate_responses=True,
model_name="gpt-3.5-turbo", # or "gpt-4"
temperature=0.7,
max_tokens=150
)
SpellerAgentConfig(
# Simple configuration for character spelling functionality
# Automatically spells out each character with spaces
)
Create or modify app/system_prompt.md
to customize your AI agent's behavior:
# Example System Prompt
You are a professional customer service representative for [Your Company].
## Your Role:
- Provide helpful and accurate information
- Maintain a friendly and professional tone
- Ask clarifying questions when needed
- Keep responses concise and clear
## Guidelines:
- Always greet customers warmly
- Listen actively to their concerns
- Provide step-by-step solutions
- End calls with next steps or follow-up information
## Restrictions:
- Do not provide personal information
- Escalate complex technical issues
- Stay within your knowledge domain
- Default: Deepgram (recommended for production)
- High accuracy speech recognition
- Real-time processing
- Multiple language support
- Alternative: Configure custom transcription services through Vocode
- Default: ElevenLabs (high-quality voice synthesis)
- Natural-sounding voices
- Emotional expression
- Multiple voice options
- Alternative: Configure alternative TTS providers
# Core Application
BASE_URI=your-domain.com
PORT=6000
# Twilio Configuration
TWILIO_ACCOUNT_SID=your_twilio_account_sid
TWILIO_AUTH_TOKEN=your_twilio_auth_token
FROM_PHONE=+1234567890
TO_PHONE=+1234567890
# AI Services
DEEPGRAM_API_KEY=your_deepgram_api_key
OPENAI_API_KEY=your_openai_api_key
OPENAI_BASE_URL=https://api.openai.com/v1
ELEVEN_LABS_API_KEY=your_eleven_labs_api_key
ELEVEN_LABS_VOICE_ID=your_preferred_voice_id
# Redis Configuration (Optional)
REDIS_URL=redis://localhost:6379
REDIS_PASSWORD=your_redis_password
# Monitoring
PROMETHEUS_PORT=8000
LOG_LEVEL=INFO
# Audio Configuration
AUDIO_ENCODING=mulaw
SAMPLING_RATE=8000
CHUNK_SIZE=8192
For development and testing:
version: '3.8'
services:
voice-bot:
build: .
ports:
- "6000:6000"
- "8000:8000"
environment:
- BASE_URI=${BASE_URI}
- TWILIO_ACCOUNT_SID=${TWILIO_ACCOUNT_SID}
- TWILIO_AUTH_TOKEN=${TWILIO_AUTH_TOKEN}
- DEEPGRAM_API_KEY=${DEEPGRAM_API_KEY}
- OPENAI_API_KEY=${OPENAI_API_KEY}
- ELEVEN_LABS_API_KEY=${ELEVEN_LABS_API_KEY}
- ELEVEN_LABS_VOICE_ID=${ELEVEN_LABS_VOICE_ID}
volumes:
- ./app:/app
- ./logs:/logs
depends_on:
- redis
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis_data:/data
volumes:
redis_data:
- Symptom: Calls not connecting or dropping
- Solutions:
- Verify all API keys are valid and have sufficient credits
- Check network connectivity and firewall rules
- Ensure webhook URLs are publicly accessible
- Test with ngrok for local development
- Symptom: Poor audio quality or delays
- Solutions:
- Verify Deepgram and ElevenLabs configurations
- Check network latency and bandwidth
- Review audio codec settings (mulaw recommended)
- Monitor CPU and memory usage
- Symptom: AI not responding appropriately
- Solutions:
- Check OpenAI API key and quota
- Review system prompt configuration
- Verify agent factory configuration
- Check event manager logs
# Check pod status
kubectl describe pod <pod-name>
# View detailed logs
kubectl logs <pod-name> --previous
# Check service connectivity
kubectl port-forward service/voice-bot 6000:3000
# Debug configuration
kubectl get configmap voice-bot-config -o yaml
# View application logs
docker-compose logs -f voice-bot
# Check Redis connection
docker-compose exec redis redis-cli ping
# Monitor metrics
curl http://localhost:8000/metrics
# Test webhook endpoint
curl -X POST http://localhost:6000/inbound_call \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "From=+1234567890&To=+1234567890"
# Monitor active sessions
kubectl exec -it deployment/voice-bot -- python -c "
from app.metrics import SESSION_GAUGE
print(f'Active sessions: {SESSION_GAUGE._value.get()}')
"
# Check system resources
kubectl top pods -l app=voice-bot
# View recent events
kubectl get events --sort-by=.metadata.creationTimestamp
- Low Traffic (< 10 concurrent calls): 1-2 replicas, 512Mi memory
- Medium Traffic (10-50 concurrent calls): 3-5 replicas, 1Gi memory
- High Traffic (50+ concurrent calls): 5+ replicas, 2Gi memory, dedicated nodes
# Monitor resource usage
kubectl top pods -l app=voice-bot --containers
# Check HPA status (if configured)
kubectl get hpa voice-bot
# View resource quotas
kubectl describe quota
POST /inbound_call
Content-Type: application/x-www-form-urlencoded
Parameters:
- From: Caller's phone number
- To: Called phone number
- CallSid: Twilio call identifier
GET /health
Response: {"status": "healthy", "timestamp": "2024-01-01T00:00:00Z"}
GET /metrics
Content-Type: text/plain
Response: Prometheus metrics format
The system handles the following event types through VoiceBotEventsManager
:
Event Type | Description | Metrics Impact |
---|---|---|
TRANSCRIPT |
Real-time speech transcription | Logging only |
TRANSCRIPT_COMPLETE |
Complete utterance transcribed | Logging only |
PHONE_CALL_CONNECTED |
Call successfully connected | Increments session counters |
PHONE_CALL_ENDED |
Call terminated | Decrements active sessions |
PHONE_CALL_DID_NOT_CONNECT |
Call failed to connect | Error logging |
RECORDING |
Call recording available | Logging with URL |
ACTION |
Custom action executed | Action logging |
- Documentation: Check this README and inline code comments
- Issues: Report bugs via GitHub Issues
- Discussions: Join conversations in GitHub Discussions
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
# Clone repository
git clone https://github.com/danieladdisonorg/AI-Agent-for-Telephony-voice-bot
cd AI-Agent-for-Telephony-voice-bot
# Install dependencies
poetry install
# Set up pre-commit hooks
pre-commit install
# Run tests
poetry run pytest
# Run linting
poetry run flake8 app/
poetry run black app/
This project is licensed under the MIT License. See the LICENSE file for details.
- Initial release with Vocode integration
- Support for Deepgram, OpenAI, and ElevenLabs
- Docker and Kubernetes deployment support
- Prometheus metrics and monitoring
- Inbound and outbound call handling
- Multiple agent types (ChatGPT, Speller)
- Support for additional TTS providers
- Advanced conversation analytics
- Multi-language support
- Call recording and playback features
- Advanced routing and IVR capabilities
- Integration with CRM systems