Skip to content

Build realtime AI voice agents using FastRTC for low-latency streaming, Superlinked for vector search, Twilio for live phone calls, and Runpod for scalable GPU deployment.

License

Notifications You must be signed in to change notification settings

bigdatasciencegroup/realtime-phone-agents-course

 
 

Repository files navigation

☎️ Phone Calling Agents Course ☎️

How to build an Agent Call Center using FastRTC, Superlinked, Twilio, Opik & RunPod


Architecture

Table of Contents

Course Overview

This isn't your typical plug-and-play tutorial where you spin up a demo in five minutes and call it a day.

Instead, we're building a real estate company, but with a twist … the employees will be realtime voice agents!

By the end of this course, you'll have a system capable of:

  • ☎️ Receive inbound calls with Twilio
  • 📞 Make outbound calls through Twilio
  • 🏠 Search live property data using Superlinked
  • ⚡ Run realtime conversations powered by FastRTC
  • 🗣️ Transcribe speech instantly with Moonshine + Fast Whisper
  • 🎙️ Generate lifelike voices using Kokoro + Orpheus 3B
  • 🚀 Deploy open-source models on Runpod for GPU acceleration

Excited? Let's get started!


The Neural Maze Logo

📬 Stay Updated

Join The Neural Maze and learn to build AI Systems that actually work, from principles to production. Every Wednesday, directly to your inbox. Don't miss out!

Subscribe Now

Jesus Copado YouTube Channel

🎥 Watch More Content

Join Jesús Copado on YouTube to explore how to build real AI projects—from voice agents to creative tools. Weekly videos with code, demos, and ideas that push what's possible with AI. Don't miss the next drop!

Subscribe Now


Who is this course for?

This course is for Software Engineers, ML Engineers, and AI Engineers who want to level up by building complex end-to-end apps. It's not just a basic "Hello World" tutorial—it's a deep dive into making production-ready voice agents.

Course Breakdown: Week by Week

Each week, you'll unlock a new chapter of the journey. You'll get:

  • 🧾 A Substack article that walks through the concepts and code in detail
  • 💻 A new batch of code pushed directly to this repo
  • 🎥 A Live Session where we explore everything together

Here’s what the upcoming weeks look like 👇

Lesson Number Title Article Code Live Session
0
Project overview and architecture Diagram 0 Week 0 Thumbnail 0
1
Building Realtime Voice Agents with FastRTC Diagram 1 Week 1 Thumbnail 1
2
The Missing Layer in Modern AI Retrieval Diagram 2 Week 2 November 30
3
Improving STT and TTS Systems December 3 December 3 December 7
4
Deployment, monitoring and Twilio Integration December 10 December 10 December 14

Getting Started

Before diving into the lessons, make sure you have everything set up properly:

  1. 📋 Initial Setup: Follow the instructions in docs/GETTINGS_STARTED.md to configure your environment and install dependencies.
  2. 📚 Learn Lesson by Lesson: Once setup is complete, come back here and follow the lessons in order.

Each lesson builds on the previous one, so it's important to follow them sequentially!


Lesson 0: Project Overview and Architecture

Goal: Understand the big picture and architecture of the realtime phone agent system.

Steps:

  1. 📖 Read the Substack article to understand the overall architecture
  2. 🎥 Watch the Live Session recording for a deeper dive

This lesson sets the foundation for everything that follows!


Lesson 1: Building Realtime Voice Agents with FastRTC

Goal: Build your first working voice agent using FastRTC and integrate it with Twilio.

Steps:

  1. 📖 Read the Article: Start with the Substack article to understand FastRTC fundamentals
  2. 📓 Work Through the Notebook: Open and run through notebooks/lesson_1_fastrtc_agents.ipynb to get hands-on experience
  3. 💻 Explore the Code: Dive into the repository code to see how everything is implemented
  4. 🚀 Run the Applications: Try both deployment options:

Option A: Gradio Application (Quick Demo)

Run the Gradio interface (check out demo videos in the Substack article):

make start-gradio-application

This starts an interactive web interface where you can test the voice agent locally.

NOTE: If you get the error 'No such file or directory: 'ffprobe', just install ffmpeg in your system to solve it

Option B: FastAPI Call Center (Production-Ready)

For a production-ready setup that can receive real phone calls:

Step 1: Start the call center application

make start-call-center

This starts a FastAPI application using Docker Compose on port 8000.

Step 2: Expose your local server to the internet

make start-ngrok-tunnel

Or manually:

ngrok http 8000

Step 3: Connect to Twilio

Follow the instructions in the article to:

  • Configure your Twilio account
  • Connect your ngrok URL to Twilio
  • Start receiving real phone calls!

🎥 Join the Live Session: Join Premium, and you'll receive a notification when we are live on Sunday, November 23rd at 5PM CET for a complete walkthrough and Q&A!


Lesson 2: The Missing Layer in Modern AI Retrieval

Goal: Learn how to implement advanced search capabilities for realtime voice agents using Superlinked to handle complex, multi-attribute queries.

Steps:

  1. 📖 Read the Article: Start with the Substack article to understand:

    • Why traditional vector search isn't enough for multi-attribute queries
    • How Superlinked combines different data types (text, numbers, categories) into a unified search space
    • The limitations of metadata filters, multiple searches, and re-ranking approaches
  2. 📓 Work Through the Notebook: Open and run through notebooks/lesson_2_superlinked_property_search.ipynb to learn:

    • How to define different Space types (TextSimilaritySpace, NumberSpace, CategoricalSimilaritySpace)
    • How to combine spaces into a single searchable index
    • How to dynamically adjust weights at query time
  3. 💻 Explore the Code: Dive into the repository to see how Superlinked integrates with our voice agent:

    • Check out src/realtime_phone_agents/infrastructure/superlinked/ for the implementation
    • Review src/realtime_phone_agents/agent/tools/property_search.py to see how the search tool is exposed to the agent

    We'll explore the code in detail during the Live Session!

  4. 🚀 Test the Complete System: Now it's time to see everything work together!

    Step 1: Start the call center application

    make start-call-center

    Step 2: Expose your local server (if not already running)

    make start-ngrok-tunnel

    Step 3: Call your Twilio number and test the property search

    Try asking the agent:

    "Do you have apartments in Barrio de Salamanca of at most 900,000 euros?"

    Wait for the response. The agent should find and return information about the only apartment in the dataset (data/properties.csv) that meets these criteria!

    This demonstrates how the voice agent can now handle complex queries combining location (Barrio de Salamanca) and price constraints (≤ €900,000) in real-time.

🎥 Join the Live Session: Join Premium for the live walkthrough on Saturday, November 30th where we'll dive deep into the code and answer your questions!


The tech stack

Technology Description
FastRTC Logo The python library for real-time communication.
Superlinked Logo SSuperlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
Runpod Logo The end-to-end AI cloud that simplifies building and deploying models.
Opik Logo Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
Twilio Logo Twilio is a cloud communications platform that enables developers to build, manage, and automate voice, text, video, and other communication services through APIs.

Contributors

Miguel Otero Pedrido | Senior ML / AI Engineer
Founder of The Neural Maze. Rick and Morty fan.

LinkedIn
YouTube
The Neural Maze Newsletter
Jesús Copado | Senior ML / AI Engineer
Equal parts cinema fan and AI enthusiast.

YouTube
LinkedIn

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Build realtime AI voice agents using FastRTC for low-latency streaming, Superlinked for vector search, Twilio for live phone calls, and Runpod for scalable GPU deployment.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 55.5%
  • Python 42.8%
  • Makefile 1.1%
  • Dockerfile 0.6%