-
Notifications
You must be signed in to change notification settings - Fork 213
Description
Describe the bug
Looking at this code snippet:
agents-js/agents/src/voice/audio_recognition.ts
Lines 641 to 667 in 455b5ba
| const commitUserTurnTask = | |
| (delayDuration: number = 500) => | |
| async (controller: AbortController) => { | |
| if (Date.now() - this.lastFinalTranscriptTime > delayDuration) { | |
| // flush the stt by pushing silence | |
| if (audioDetached && this.sampleRate !== undefined) { | |
| const numSamples = Math.floor(this.sampleRate * 0.5); | |
| const silence = new Int16Array(numSamples * 2); | |
| const silenceFrame = new AudioFrame(silence, this.sampleRate, 1, numSamples); | |
| this.silenceAudioWriter.write(silenceFrame); | |
| } | |
| // wait for the final transcript to be available | |
| await delay(delayDuration, { signal: controller.signal }); | |
| } | |
| if (this.audioInterimTranscript) { | |
| // append interim transcript in case the final transcript is not ready | |
| this.audioTranscript = `${this.audioTranscript} ${this.audioInterimTranscript}`.trim(); | |
| } | |
| this.audioInterimTranscript = ''; | |
| const chatCtx = this.hooks.retrieveChatCtx(); | |
| this.logger.debug('running EOU detection on commitUserTurn'); | |
| this.runEOUDetection(chatCtx); | |
| this.userTurnCommitted = true; | |
| }; |
I noticed that we're always adding a 500ms delay even for realtime pipeline that has no STT turned on. I'm using manual turn detection where I call commitUserTurn(), and it's causing a 500ms delay before speech handle gets created.
I verified that the delay() function is being called even with no transcription.
here's my config:
const session = new voice.AgentSession({
llm: new openai.realtime.RealtimeModel({
model: 'gpt-realtime-mini',
turnDetection: null,
modalities: ['audio', 'text'],
}),
turnDetection: 'manual',
voiceOptions: {
preemptiveGeneration: false,
minEndpointingDelay: 0,
maxEndpointingDelay: 0,
minInterruptionDuration: 0,
allowInterruptions: false,
},
});
await session.start({
agent: this.agent,
room: this.room,
inputOptions: {
audioEnabled: true,
textEnabled: false,
},
outputOptions: {
audioEnabled: false,
transcriptionEnabled: false,
},
});
Relevant log output
No response
Describe your environment
System:
OS: macOS 14.7
CPU: (10) arm64 Apple M1 Max
Memory: 98.83 MB / 32.00 GB
Shell: 5.9 - /bin/zsh
Binaries:
Node: 24.11.1 - ~/.nvm/versions/node/v24.11.1/bin/node
npm: 11.6.2 - ~/.nvm/versions/node/v24.11.1/bin/npm
pnpm: 10.25.0 - /opt/homebrew/bin/pnpm
Watchman: 2025.11.10.00 - /opt/homebrew/bin/watchman
"@livekit/agents": "1.0.30",
"@livekit/agents-plugin-livekit": "1.0.30",
"@livekit/agents-plugin-openai": "1.0.30",
Minimal reproducible example
No response
Additional information
No response