Skip to content

AssemblyAI-Community/simple-rtaa-demo

Repository files navigation

Real-Time Call Service Agent Assist Demo

Overview

This project demonstrates a real-time, AI-powered agent assist workflow for customer support calls. It captures audio from a loopback device (BlackHole) or microphone, transcribes in near real-time via AssemblyAI, and runs lightweight agents (powered by Cerebras) to label speakers, match the caller to CRM data, and generate actionable guidance for the human agent.

Key Features

  • BlackHole audio loopback for clean capture (recommended)
  • Live microphone capture (PyAudio)
  • Near real-time transcription (AssemblyAI Universal-Streaming)
  • Speaker role identification (STAFF vs CUSTOMER)
  • Caller-to-CRM matching using files in crm/
  • Agent recommendations using knowledge_base/ + current transcript
  • Continuous logging to logs/

Architecture

  • transcription.py — streams audio (file or live) to AssemblyAI and appends formatted turns to logs/assemblyai.log
  • agents.py — tails assemblyai.log and maintains:
    • transcripts.log (speaker role labels)
    • customer.log (best-match CRM record)
    • recommendations.log (actionable guidance for STAFF)
  • rtaa.py — convenience launcher to run both the transcriber and agents together

Mock Data

  • knowledge_base/: fictitious policies, product details, and technical guides
  • crm/: fictitious customer profiles
  • sample_call.wav: a stereo sample call aligned to the mock data

Prerequisites

  • Python 3.9+ (tested with 3.13 via venv)
  • macOS or Linux (tested on macOS)
  • For loopback capture: BlackHole installed and selected as input

Install

  1. Create and activate a virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
  1. Install dependencies
pip install -r requirements.txt

Configure

Create a .env in the project root with your API keys (or copy from sample):

cp .env.sample .env

Then edit .env:

ASSEMBLYAI_API_KEY=your_assemblyai_key
CEREBRAS_API_KEY=your_cerebras_key

Run

  1. File mode (default): streams both channels of sample_call.wav
python rtaa.py --mode file -f ./sample_call.wav
  1. Live mode: stream from BlackHole or microphone
# BlackHole input (preferred if you want to play audio locally and capture it)
python rtaa.py --mode live -i blackhole

# Microphone input
python rtaa.py --mode live -i microphone

Flags

  • --mode {file,live}: file streams stereo WAV; live streams mono at 16 kHz
  • -f, --filepath: WAV path for file mode (defaults to ./sample_call.wav)
  • -i, --input {blackhole,microphone}: input device for live mode

Expected Console Output (abridged)

✅ Log reset: ./logs/assemblyai.log
🔊 Playing audio: ./sample_call.wav
🚀 Realtime agent started. Press Ctrl+C to stop.
[Speaker 1] 🎬 Session Begin: id=...
[agents] 🧹 Reset transcripts.log, customer.log, recommendations.log
[agents] 📡 Monitoring assemblyai.log ... (Ctrl+C to stop)
Speaker 1: Thank you for calling...
Speaker 2: Hi, this is John...

What Gets Written

  • logs/assemblyai.log: raw, formatted turns from AssemblyAI
  • logs/transcripts.log: speaker roles labeled per line (STAFF/CUSTOMER)
  • logs/customer.log: the currently selected CRM record (overwrites on change)
  • logs/recommendations.log: short list of prioritized actions for STAFF

Notes on Audio Sources

  • File mode expects a stereo WAV. Each channel is streamed separately to improve speaker separation.
  • Live mode uses 16 kHz mono streaming. If -i blackhole is chosen, the app also attempts local pass-through audio.

Troubleshooting

  • Missing ASSEMBLYAI_API_KEY: ensure your .env is loaded or environment variable is set.
  • No audio captured in live mode:
    • macOS: System Settings › Privacy & Security › Microphone → allow Terminal/IDE
    • BlackHole not found: install and select BlackHole as the input device; the app falls back to the default mic if not found
  • Port or device busy: disable local playback in transcription.py by setting ENABLE_LOCAL_PLAYBACK = False if your driver conflicts
  • cerebras_cloud_sdk import errors: verify installation from requirements.txt and the CEREBRAS_API_KEY in .env

Development Tips

  • Run python transcription.py --mode file -f ./sample_call.wav to test transcription alone
  • Run python agents.py separately to tail assemblyai.log and regenerate outputs
  • Logs are plain text; delete or edit them freely between runs

Limitations

  • Speaker role classification and CRM matching rely on LLM outputs and can be imperfect
  • The demo uses small local text files as its “CRM” and “KB”; it is not connected to a real backend

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages