Skip to content

manual turn detection with realtime model always has unnecessary 500ms delay #926

@lawctan

Description

@lawctan

Describe the bug

Looking at this code snippet:

const commitUserTurnTask =
(delayDuration: number = 500) =>
async (controller: AbortController) => {
if (Date.now() - this.lastFinalTranscriptTime > delayDuration) {
// flush the stt by pushing silence
if (audioDetached && this.sampleRate !== undefined) {
const numSamples = Math.floor(this.sampleRate * 0.5);
const silence = new Int16Array(numSamples * 2);
const silenceFrame = new AudioFrame(silence, this.sampleRate, 1, numSamples);
this.silenceAudioWriter.write(silenceFrame);
}
// wait for the final transcript to be available
await delay(delayDuration, { signal: controller.signal });
}
if (this.audioInterimTranscript) {
// append interim transcript in case the final transcript is not ready
this.audioTranscript = `${this.audioTranscript} ${this.audioInterimTranscript}`.trim();
}
this.audioInterimTranscript = '';
const chatCtx = this.hooks.retrieveChatCtx();
this.logger.debug('running EOU detection on commitUserTurn');
this.runEOUDetection(chatCtx);
this.userTurnCommitted = true;
};

I noticed that we're always adding a 500ms delay even for realtime pipeline that has no STT turned on. I'm using manual turn detection where I call commitUserTurn(), and it's causing a 500ms delay before speech handle gets created.

I verified that the delay() function is being called even with no transcription.

here's my config:

const session = new voice.AgentSession({
      llm: new openai.realtime.RealtimeModel({
        model: 'gpt-realtime-mini',
        turnDetection: null,
        modalities: ['audio', 'text'],
      }),
      turnDetection: 'manual',
      voiceOptions: {
        preemptiveGeneration: false,
        minEndpointingDelay: 0,
        maxEndpointingDelay: 0,
        minInterruptionDuration: 0,
        allowInterruptions: false,
      },
    });

await session.start({
      agent: this.agent,
      room: this.room,
      inputOptions: {
        audioEnabled: true,
        textEnabled: false,
      },
      outputOptions: {
        audioEnabled: false,
        transcriptionEnabled: false,
      },
    });

Relevant log output

No response

Describe your environment

System:
OS: macOS 14.7
CPU: (10) arm64 Apple M1 Max
Memory: 98.83 MB / 32.00 GB
Shell: 5.9 - /bin/zsh
Binaries:
Node: 24.11.1 - ~/.nvm/versions/node/v24.11.1/bin/node
npm: 11.6.2 - ~/.nvm/versions/node/v24.11.1/bin/npm
pnpm: 10.25.0 - /opt/homebrew/bin/pnpm
Watchman: 2025.11.10.00 - /opt/homebrew/bin/watchman

"@livekit/agents": "1.0.30",
"@livekit/agents-plugin-livekit": "1.0.30",
"@livekit/agents-plugin-openai": "1.0.30",

Minimal reproducible example

No response

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions