-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Hi,
First of all, Livekit is great. I have been using it for over a year now to create web-based voice AI agents. More recently, I started applying livekit agents to telephony, to answer an inbound call at a Twilio number, for instance. And this is where latency becomes quite unbearable. Part of my question is how I can find the bottleneck/what I can do to reduce it, and another is if this is livekit-specific. Below is a comparison I made between the Engress recording and the microphone recording of the call. I know this is extremely rough, but this was easiest to do, and the latency difference is quite apparent. The code I used here is a prompt modified version of the starter example.
Based on the image below, what's seems apparent to me is telephony latency, that is, after the STT->LLM->TTS pipeline has ended, the audio which gets back to the phone (and vice versa) is causing the latency issue. When I compared the latency graphs (web vs telephony) of something like vapi, it was absolutely on-point. This was confusing to me.
I appreciate any thoughts here.
(first is the engress recording, second is the microphone recording)
