AI Agent for Telephony - Voice Bot

A production-ready conversational AI telephony system built on Vocode, integrating Deepgram, OpenAI, and ElevenLabs to deliver natural voice interactions for inbound and outbound phone calls.

Overview

This AI-powered telephony solution enables businesses to deploy intelligent voice agents for various use cases including customer support, appointment scheduling, lead qualification, and information collection. The system provides seamless integration with existing telephony infrastructure through Twilio's robust platform.

Key Features

Real-time Voice Processing: Advanced speech-to-text and text-to-speech capabilities
Intelligent Conversation Management: OpenAI-powered natural language understanding
Scalable Architecture: Docker-based deployment with Kubernetes support
Comprehensive Monitoring: Built-in Prometheus metrics and observability
Flexible Configuration: Customizable agents, transcribers, and synthesizers

Architecture

The system leverages the following technology stack:

Voice Processing: Deepgram for speech transcription, ElevenLabs for synthesis
AI Engine: OpenAI for conversational intelligence
Telephony: Twilio for call management and routing
Infrastructure: FastAPI, Redis, Docker, Kubernetes
Monitoring: Prometheus metrics collection

System Workflow Diagram

┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                                 AI TELEPHONY VOICE BOT WORKFLOW                         │
└─────────────────────────────────────────────────────────────────────────────────────────┘

                                    👤 USER (Phone Call)
                                           │
                                           ▼
┌──────────────────────────────────────────────────────────────────────────────────────────┐
│                              🌐 TWILIO TELEPHONY SERVICE                                │
│                          ┌─────────────────┬─────────────────┐                          │
│                          │   Inbound Call  │  Outbound Call  │                          │
│                          │    Webhook      │   Initiation    │                          │
└──────────────────────────┼─────────────────┼─────────────────┼──────────────────────────┘
                           │                 │                 │
                           ▼                 ▼                 ▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                           📱 VOICE BOT APPLICATION CORE                                │
│                                                                                         │
│  ┌─────────────────────┐                                    ┌─────────────────────┐    │
│  │  📞 INBOUND HANDLER │                                    │ 📱 OUTBOUND HANDLER │    │
│  │   (inbound.py)      │                                    │ (outbound_call.py)  │    │
│  └──────────┬──────────┘                                    └──────────┬──────────┘    │
│             │                                                          │               │
│             └──────────────────────┐              ┌────────────────────┘               │
│                                    ▼              ▼                                    │
│                          ┌─────────────────────────────┐                              │
│                          │  🎯 VOICE BOT EVENTS MANAGER │                              │
│                          │  (VoiceBotEventsManager)     │                              │
│                          │                              │                              │
│                          │  Event Types Handled:        │                              │
│                          │  • TRANSCRIPT               │                              │
│                          │  • TRANSCRIPT_COMPLETE      │                              │
│                          │  • PHONE_CALL_CONNECTED     │                              │
│                          │  • PHONE_CALL_ENDED         │                              │
│                          │  • RECORDING                │                              │
│                          │  • ACTION                   │                              │
│                          └─────────────┬───────────────┘                              │
│                                        │                                              │
│                                        ▼                                              │
│  ┌─────────────────────────────────────────────────────────────────────────────────┐  │
│  │                    🏭 AGENT FACTORY & CONFIGURATION                            │  │
│  │                                                                                 │  │
│  │  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐           │  │
│  │  │ 📝 PROMPT       │    │ 🏭 AGENT        │    │ 🗄️ REDIS CONFIG │           │  │
│  │  │ HANDLER         │    │ FACTORY         │    │ MANAGER         │           │  │
│  │  │ (system_prompt) │    │ (SpellerAgent   │    │ (Session Store) │           │  │
│  │  └─────────────────┘    │ Factory)        │    └─────────────────┘           │  │
│  │                         └─────────┬───────┘                                  │  │
│  │                                   │                                          │  │
│  │                    ┌──────────────┼──────────────┐                          │  │
│  │                    ▼              ▼              ▼                          │  │
│  │         ┌─────────────────┐              ┌─────────────────┐                │  │
│  │         │ 🤖 CHATGPT      │              │ 🔤 SPELLER      │                │  │
│  │         │ AGENT           │              │ AGENT           │                │  │
│  │         │ (OpenAI)        │              │ (Character      │                │  │
│  │         │                 │              │ Spelling)       │                │  │
│  │         └─────────────────┘              └─────────────────┘                │  │
│  └─────────────────────────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────────────────────┘
                                        │
                                        ▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                            🔄 REAL-TIME PROCESSING PIPELINE                            │
│                                                                                         │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐            │
│  │    🎤       │    │    📝       │    │    🤖       │    │    🔊       │            │
│  │  DEEPGRAM   │───▶│   EVENTS    │───▶│   AGENT     │───▶│ ELEVENLABS  │            │
│  │ (Speech-to- │    │  MANAGER    │    │ PROCESSING  │    │(Text-to-    │            │
│  │   Text)     │    │(Transcript) │    │(AI Response)│    │ Speech)     │            │
│  └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘            │
│         ▲                                      │                   │                 │
│         │                                      ▼                   ▼                 │
│  ┌─────────────┐                    ┌─────────────────────────────────┐              │
│  │ 🌐 TWILIO   │◀───────────────────│      🎯 EVENTS MANAGER          │              │
│  │ (Audio      │                    │     (Response Routing)          │              │
│  │ Streaming)  │                    └─────────────────────────────────┘              │
│  └─────────────┘                                                                     │
└─────────────────────────────────────────────────────────────────────────────────────────┘
                                        │
                                        ▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                          📊 MONITORING & ANALYTICS LAYER                              │
│                                                                                         │
│  ┌─────────────────────┐              ┌─────────────────────┐                         │
│  │ 📈 PROMETHEUS       │              │ 📋 EVENT LOGGING    │                         │
│  │ METRICS             │              │ & ANALYTICS         │                         │
│  │                     │              │                     │                         │
│  │ • Session Counter   │              │ • Call Duration     │                         │
│  │ • Active Sessions   │              │ • Transcript Logs   │                         │
│  │ • Call Analytics    │              │ • Error Tracking    │                         │
│  │                     │              │ • Performance Data  │                         │
│  └─────────────────────┘              └─────────────────────┘                         │
└─────────────────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                              🔧 EXTERNAL SERVICES                                      │
│                                                                                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                  │
│  │ 🌐 TWILIO   │  │ 🎤 DEEPGRAM │  │ 🧠 OPENAI   │  │ 🔊 ELEVEN   │                  │
│  │ Telephony   │  │ Speech-to-  │  │ Language    │  │ LABS        │                  │
│  │ Platform    │  │ Text API    │  │ Model API   │  │ Text-to-    │                  │
│  │             │  │             │  │             │  │ Speech API  │                  │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘                  │
└─────────────────────────────────────────────────────────────────────────────────────────┘

                                  📊 DATA FLOW LEGEND
                              ═══════════════════════════
                              ───▶  Audio/Voice Stream
                              ═══▶  Text/Data Processing  
                              ┄┄▶   Configuration/Control
                              ▲▼    Bidirectional Communication

Workflow Description

1. Call Initiation Phase

User ──(Phone Call)──▶ Twilio ──(Webhook/API)──▶ Voice Bot Application

Inbound: User dials Twilio number → Webhook triggers inbound.py
Outbound: Application calls outbound_call.py → Twilio connects to user

2. Session Management & Event Processing

Call Handler ──▶ VoiceBotEventsManager ──▶ Agent Factory ──▶ Selected Agent

Events Manager handles all call lifecycle events
Session metrics tracked via Prometheus
Agent Factory selects appropriate AI agent based on configuration

3. Real-Time Voice Processing Pipeline

User Audio ──▶ Deepgram ──▶ Text ──▶ AI Agent ──▶ Response ──▶ ElevenLabs ──▶ Audio ──▶ User

Step 1: Deepgram converts speech to text in real-time
Step 2: Selected agent (ChatGPT/Speller) processes the text
Step 3: ElevenLabs synthesizes AI response back to audio
Step 4: Twilio streams audio response to user

4. Agent Types & Processing

🤖 ChatGPT Agent: Uses OpenAI API for intelligent conversations
🔤 Speller Agent: Simple character-by-character spelling functionality
📝 Prompt Handler: Loads system prompts from system_prompt.md

5. Event Types Handled

TRANSCRIPT: Real-time speech transcription updates
PHONE_CALL_CONNECTED: Session initialization and metrics increment
PHONE_CALL_ENDED: Session cleanup and metrics decrement
RECORDING: Call recording management and storage
ACTION: Custom action execution and logging

6. Monitoring & Analytics

📈 Prometheus Metrics: Session counters, active call tracking
📋 Event Logging: Comprehensive call analytics and debugging
🔍 Real-time Monitoring: Performance tracking and error detection

Prerequisites

Required Dependencies

Docker and Docker Compose
Python 3.8+ with Poetry (for development)

API Keys Required

Deepgram: Speech transcription services
OpenAI: Language model and conversation management
ElevenLabs: Text-to-speech synthesis
Twilio: Telephony services and phone number management

Development Tools (Optional)

Ngrok or Cloudflare Tunnels for local testing
Kubernetes cluster for production deployment
Helm 3.0+ for Kubernetes deployments

Quick Start

1. Environment Configuration

cp .env.template .env

Configure your .env file with the required API keys:

# Telephony Configuration
BASE_URI=your-domain.com
TWILIO_ACCOUNT_SID=your_twilio_sid
TWILIO_AUTH_TOKEN=your_twilio_token
FROM_PHONE=+1234567890
TO_PHONE=+1234567890

# AI Services
DEEPGRAM_API_KEY=your_deepgram_key
OPENAI_API_KEY=your_openai_key
OPENAI_BASE_URL=https://api.openai.com/v1
ELEVEN_LABS_API_KEY=your_eleven_labs_key
ELEVEN_LABS_VOICE_ID=your_voice_id

2. Local Development Setup

For local testing, expose your development server:

ngrok http 6000

Update your .env with the ngrok URL (without https://):

BASE_URI=abc123.ngrok.app

3. Start the Application

docker-compose up --build

The application will be available at http://localhost:6000

Telephony Configuration

Inbound Call Setup

Acquire Phone Number
- Log into your Twilio Console
- Navigate to Phone Numbers → Manage → Buy a number
- Purchase a phone number for your region
Configure Webhook
- Go to Phone Numbers → Manage → Active Numbers
- Select your purchased number
- Set Webhook URL to: https://your-domain.com/inbound_call
- Set HTTP method to POST
- Save configuration

Outbound Call Execution

Ensure your environment variables are configured, then execute:

poetry install
poetry run python outbound_call.py

Monitoring and Observability

The system exposes Prometheus metrics on port 8000:

voicebot_session_count: Total number of sessions initiated
voicebot_active_sessions: Current active session count

Access metrics at: http://localhost:8000/metrics

Production Deployment

Kubernetes with Helm

Prerequisites

Kubernetes cluster (EKS, GKE, AKS, or self-managed)
Helm 3.0+
kubectl configured for your cluster

Deployment Process

Clone Repository

git clone https://github.com/danieladdisonorg/AI-Agent-for-Telephony-voice-bot
cd AI-Agent-for-Telephony-voice-bot

Create Configuration

Create production-values.yaml:

env:
  BASE_URI: "your-production-domain.com"
  DEEPGRAM_API_KEY: "your-deepgram-key"
  OPENAI_API_KEY: "your-openai-key"
  OPENAI_BASE_URL: "https://api.openai.com/v1"
  ELEVEN_LABS_API_KEY: "your-eleven-labs-key"
  ELEVEN_LABS_VOICE_ID: "your-voice-id"
  TWILIO_ACCOUNT_SID: "your-twilio-sid"
  TWILIO_AUTH_TOKEN: "your-twilio-token"
  FROM_PHONE: "your-twilio-number"
  TO_PHONE: "your-target-number"

resources:
  limits:
    cpu: 1000m
    memory: 1Gi
  requests:
    cpu: 500m
    memory: 512Mi

replicas: 3

Deploy Application

helm install voice-bot ./kubernetes/voice-bot-helm/voice-bot -f production-values.yaml

Verify Deployment

kubectl get pods -l app=voice-bot
kubectl get services voice-bot

Management Commands

Update Deployment:

helm upgrade voice-bot ./kubernetes/voice-bot-helm/voice-bot -f production-values.yaml

Remove Deployment:

helm uninstall voice-bot

View Logs:

kubectl logs -f deployment/voice-bot

Configuration Options

Agent Configuration

Customize conversation behavior through AgentConfig parameters in your deployment configuration.

ChatGPT Agent Configuration

ChatGPTAgentConfig(
    initial_message=BaseMessage(text="Hello! How can I help you today?"),
    prompt_preamble="You are a helpful customer service representative...",
    generate_responses=True,
    model_name="gpt-3.5-turbo",  # or "gpt-4"
    temperature=0.7,
    max_tokens=150
)

Speller Agent Configuration

SpellerAgentConfig(
    # Simple configuration for character spelling functionality
    # Automatically spells out each character with spaces
)

System Prompt Customization

Create or modify app/system_prompt.md to customize your AI agent's behavior:

# Example System Prompt

You are a professional customer service representative for [Your Company]. 

## Your Role:
- Provide helpful and accurate information
- Maintain a friendly and professional tone
- Ask clarifying questions when needed
- Keep responses concise and clear

## Guidelines:
- Always greet customers warmly
- Listen actively to their concerns
- Provide step-by-step solutions
- End calls with next steps or follow-up information

## Restrictions:
- Do not provide personal information
- Escalate complex technical issues
- Stay within your knowledge domain

Transcriber Options

Default: Deepgram (recommended for production)
- High accuracy speech recognition
- Real-time processing
- Multiple language support
Alternative: Configure custom transcription services through Vocode

Synthesizer Options

Default: ElevenLabs (high-quality voice synthesis)
- Natural-sounding voices
- Emotional expression
- Multiple voice options
Alternative: Configure alternative TTS providers

Advanced Configuration

Environment Variables Reference

# Core Application
BASE_URI=your-domain.com
PORT=6000

# Twilio Configuration
TWILIO_ACCOUNT_SID=your_twilio_account_sid
TWILIO_AUTH_TOKEN=your_twilio_auth_token
FROM_PHONE=+1234567890
TO_PHONE=+1234567890

# AI Services
DEEPGRAM_API_KEY=your_deepgram_api_key
OPENAI_API_KEY=your_openai_api_key
OPENAI_BASE_URL=https://api.openai.com/v1
ELEVEN_LABS_API_KEY=your_eleven_labs_api_key
ELEVEN_LABS_VOICE_ID=your_preferred_voice_id

# Redis Configuration (Optional)
REDIS_URL=redis://localhost:6379
REDIS_PASSWORD=your_redis_password

# Monitoring
PROMETHEUS_PORT=8000
LOG_LEVEL=INFO

# Audio Configuration
AUDIO_ENCODING=mulaw
SAMPLING_RATE=8000
CHUNK_SIZE=8192

Docker Compose Configuration

For development and testing:

version: '3.8'
services:
  voice-bot:
    build: .
    ports:
      - "6000:6000"
      - "8000:8000"
    environment:
      - BASE_URI=${BASE_URI}
      - TWILIO_ACCOUNT_SID=${TWILIO_ACCOUNT_SID}
      - TWILIO_AUTH_TOKEN=${TWILIO_AUTH_TOKEN}
      - DEEPGRAM_API_KEY=${DEEPGRAM_API_KEY}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - ELEVEN_LABS_API_KEY=${ELEVEN_LABS_API_KEY}
      - ELEVEN_LABS_VOICE_ID=${ELEVEN_LABS_VOICE_ID}
    volumes:
      - ./app:/app
      - ./logs:/logs
    depends_on:
      - redis

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data

volumes:
  redis_data:

Troubleshooting

Common Issues

Connection Problems

Symptom: Calls not connecting or dropping
Solutions:
- Verify all API keys are valid and have sufficient credits
- Check network connectivity and firewall rules
- Ensure webhook URLs are publicly accessible
- Test with ngrok for local development

Audio Quality Issues

Symptom: Poor audio quality or delays
Solutions:
- Verify Deepgram and ElevenLabs configurations
- Check network latency and bandwidth
- Review audio codec settings (mulaw recommended)
- Monitor CPU and memory usage

Agent Response Issues

Symptom: AI not responding appropriately
Solutions:
- Check OpenAI API key and quota
- Review system prompt configuration
- Verify agent factory configuration
- Check event manager logs

Deployment Issues

# Check pod status
kubectl describe pod <pod-name>

# View detailed logs
kubectl logs <pod-name> --previous

# Check service connectivity
kubectl port-forward service/voice-bot 6000:3000

# Debug configuration
kubectl get configmap voice-bot-config -o yaml

Debugging Commands

Local Development

# View application logs
docker-compose logs -f voice-bot

# Check Redis connection
docker-compose exec redis redis-cli ping

# Monitor metrics
curl http://localhost:8000/metrics

# Test webhook endpoint
curl -X POST http://localhost:6000/inbound_call \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "From=+1234567890&To=+1234567890"

Production Monitoring

# Monitor active sessions
kubectl exec -it deployment/voice-bot -- python -c "
from app.metrics import SESSION_GAUGE
print(f'Active sessions: {SESSION_GAUGE._value.get()}')
"

# Check system resources
kubectl top pods -l app=voice-bot

# View recent events
kubectl get events --sort-by=.metadata.creationTimestamp

Performance Optimization

Scaling Recommendations

Low Traffic (< 10 concurrent calls): 1-2 replicas, 512Mi memory
Medium Traffic (10-50 concurrent calls): 3-5 replicas, 1Gi memory
High Traffic (50+ concurrent calls): 5+ replicas, 2Gi memory, dedicated nodes

Resource Monitoring

# Monitor resource usage
kubectl top pods -l app=voice-bot --containers

# Check HPA status (if configured)
kubectl get hpa voice-bot

# View resource quotas
kubectl describe quota

API Reference

Webhook Endpoints

Inbound Call Webhook

POST /inbound_call
Content-Type: application/x-www-form-urlencoded

Parameters:
- From: Caller's phone number
- To: Called phone number
- CallSid: Twilio call identifier

Health Check

GET /health
Response: {"status": "healthy", "timestamp": "2024-01-01T00:00:00Z"}

Metrics Endpoint

GET /metrics
Content-Type: text/plain
Response: Prometheus metrics format

Event Types

The system handles the following event types through VoiceBotEventsManager:

Event Type	Description	Metrics Impact
`TRANSCRIPT`	Real-time speech transcription	Logging only
`TRANSCRIPT_COMPLETE`	Complete utterance transcribed	Logging only
`PHONE_CALL_CONNECTED`	Call successfully connected	Increments session counters
`PHONE_CALL_ENDED`	Call terminated	Decrements active sessions
`PHONE_CALL_DID_NOT_CONNECT`	Call failed to connect	Error logging
`RECORDING`	Call recording available	Logging with URL
`ACTION`	Custom action executed	Action logging

Support and Contributing

Getting Help

Documentation: Check this README and inline code comments
Issues: Report bugs via GitHub Issues
Discussions: Join conversations in GitHub Discussions

Contributing Guidelines

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Setup

# Clone repository
git clone https://github.com/danieladdisonorg/AI-Agent-for-Telephony-voice-bot
cd AI-Agent-for-Telephony-voice-bot

# Install dependencies
poetry install

# Set up pre-commit hooks
pre-commit install

# Run tests
poetry run pytest

# Run linting
poetry run flake8 app/
poetry run black app/

License

This project is licensed under the MIT License. See the LICENSE file for details.

Changelog

v1.0.0 (Current)

Initial release with Vocode integration
Support for Deepgram, OpenAI, and ElevenLabs
Docker and Kubernetes deployment support
Prometheus metrics and monitoring
Inbound and outbound call handling
Multiple agent types (ChatGPT, Speller)

Roadmap

Support for additional TTS providers
Advanced conversation analytics
Multi-language support
Call recording and playback features
Advanced routing and IVR capabilities
Integration with CRM systems

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
app		app
kubernetes/voice-bot-helm/voice-bot		kubernetes/voice-bot-helm/voice-bot
observe		observe
.env.template		.env.template
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
k8s-port-forward.bat		k8s-port-forward.bat
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

danieladdisonorg/AI-Agent-for-Telephony-voice-bot

Folders and files

Latest commit

History

Repository files navigation

AI Agent for Telephony - Voice Bot

Overview

Key Features

Architecture

System Workflow Diagram

Workflow Description

1. Call Initiation Phase

2. Session Management & Event Processing

3. Real-Time Voice Processing Pipeline

4. Agent Types & Processing

5. Event Types Handled

6. Monitoring & Analytics

Prerequisites

Required Dependencies

API Keys Required

Development Tools (Optional)

Quick Start

1. Environment Configuration

2. Local Development Setup

3. Start the Application

Telephony Configuration

Inbound Call Setup

Outbound Call Execution

Monitoring and Observability

Production Deployment

Kubernetes with Helm

Prerequisites

Deployment Process

Management Commands

Configuration Options

Agent Configuration

ChatGPT Agent Configuration

Speller Agent Configuration

System Prompt Customization

Transcriber Options

Synthesizer Options

Advanced Configuration

Environment Variables Reference

Docker Compose Configuration

Troubleshooting

Common Issues

Connection Problems

Audio Quality Issues

Agent Response Issues

Deployment Issues

Debugging Commands

Local Development

Production Monitoring

Performance Optimization

Scaling Recommendations

Resource Monitoring

API Reference

Webhook Endpoints

Inbound Call Webhook

Health Check

Metrics Endpoint

Event Types

Support and Contributing

Getting Help

Contributing Guidelines

Development Setup

License

Changelog

v1.0.0 (Current)

Roadmap

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages