Real-Time Call Service Agent Assist Demo
Overview
This project demonstrates a real-time, AI-powered agent assist workflow for customer support calls. It captures audio from a loopback device (BlackHole) or microphone, transcribes in near real-time via AssemblyAI, and runs lightweight agents (powered by Cerebras) to label speakers, match the caller to CRM data, and generate actionable guidance for the human agent.
Key Features
- BlackHole audio loopback for clean capture (recommended)
- Live microphone capture (PyAudio)
- Near real-time transcription (AssemblyAI Universal-Streaming)
- Speaker role identification (STAFF vs CUSTOMER)
- Caller-to-CRM matching using files in
crm/ - Agent recommendations using
knowledge_base/+ current transcript - Continuous logging to
logs/
Architecture
transcription.py— streams audio (file or live) to AssemblyAI and appends formatted turns tologs/assemblyai.logagents.py— tailsassemblyai.logand maintains:transcripts.log(speaker role labels)customer.log(best-match CRM record)recommendations.log(actionable guidance for STAFF)
rtaa.py— convenience launcher to run both the transcriber and agents together
Mock Data
knowledge_base/: fictitious policies, product details, and technical guidescrm/: fictitious customer profilessample_call.wav: a stereo sample call aligned to the mock data
Prerequisites
- Python 3.9+ (tested with 3.13 via venv)
- macOS or Linux (tested on macOS)
- For loopback capture: BlackHole installed and selected as input
Install
- Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate- Install dependencies
pip install -r requirements.txtConfigure
Create a .env in the project root with your API keys (or copy from sample):
cp .env.sample .envThen edit .env:
ASSEMBLYAI_API_KEY=your_assemblyai_key
CEREBRAS_API_KEY=your_cerebras_key
Run
- File mode (default): streams both channels of
sample_call.wav
python rtaa.py --mode file -f ./sample_call.wav- Live mode: stream from BlackHole or microphone
# BlackHole input (preferred if you want to play audio locally and capture it)
python rtaa.py --mode live -i blackhole
# Microphone input
python rtaa.py --mode live -i microphoneFlags
--mode {file,live}: file streams stereo WAV; live streams mono at 16 kHz-f, --filepath: WAV path for file mode (defaults to./sample_call.wav)-i, --input {blackhole,microphone}: input device for live mode
Expected Console Output (abridged)
✅ Log reset: ./logs/assemblyai.log
🔊 Playing audio: ./sample_call.wav
🚀 Realtime agent started. Press Ctrl+C to stop.
[Speaker 1] 🎬 Session Begin: id=...
[agents] 🧹 Reset transcripts.log, customer.log, recommendations.log
[agents] 📡 Monitoring assemblyai.log ... (Ctrl+C to stop)
Speaker 1: Thank you for calling...
Speaker 2: Hi, this is John...
What Gets Written
logs/assemblyai.log: raw, formatted turns from AssemblyAIlogs/transcripts.log: speaker roles labeled per line (STAFF/CUSTOMER)logs/customer.log: the currently selected CRM record (overwrites on change)logs/recommendations.log: short list of prioritized actions for STAFF
Notes on Audio Sources
- File mode expects a stereo WAV. Each channel is streamed separately to improve speaker separation.
- Live mode uses 16 kHz mono streaming. If
-i blackholeis chosen, the app also attempts local pass-through audio.
Troubleshooting
- Missing
ASSEMBLYAI_API_KEY: ensure your.envis loaded or environment variable is set. - No audio captured in live mode:
- macOS: System Settings › Privacy & Security › Microphone → allow Terminal/IDE
- BlackHole not found: install and select BlackHole as the input device; the app falls back to the default mic if not found
- Port or device busy: disable local playback in
transcription.pyby settingENABLE_LOCAL_PLAYBACK = Falseif your driver conflicts cerebras_cloud_sdkimport errors: verify installation fromrequirements.txtand theCEREBRAS_API_KEYin.env
Development Tips
- Run
python transcription.py --mode file -f ./sample_call.wavto test transcription alone - Run
python agents.pyseparately to tailassemblyai.logand regenerate outputs - Logs are plain text; delete or edit them freely between runs
Limitations
- Speaker role classification and CRM matching rely on LLM outputs and can be imperfect
- The demo uses small local text files as its “CRM” and “KB”; it is not connected to a real backend