Skip to content

min_endpointing_delay behaves differently in VAD vs STT turn detection mode #4325

@MonkeyLeeT

Description

@MonkeyLeeT

Bug Description

In 1.3.7, endpointing waits until last_speaking_time + min_endpointing_delay.

  • VAD mode: last_speaking_time is set during speech (VAD inference), so by VAD END_OF_SPEECH the delay may already
    be satisfied → effective delay ≈ max(VAD_min_silence, min_endpointing_delay).
  • STT mode: last_speaking_time is set at STT END_OF_SPEECH, so the full min_endpointing_delay is always added after
    STT EOS → additive.

Docs imply a consistent “after EOS” delay, but behavior is mode‑dependent and EOU metrics in STT mode exclude STT
silence which is perceived by user.

Expected Behavior

Either:

  • Consistent “after end‑of‑utterance signal” delay in all modes, or
  • Docs/metrics clearly state VAD = max(), STT = additive.

Reproduction Steps

1. VAD mode: set VAD min_silence=2s, min_endpointing_delay=2s. Observe commit ≈ 2s after user stop, not 4s.
2. STT mode (with providers like AssemblyAI): set STT EOS silence=2s and min_endpointing_delay=2s. Observe total ≈ 4s (STT silence + endpointing), while EOU metric end_of_utterance_delay shows ~2s.

Operating System

Linux

Models Used

No response

Package Versions

livekit-agents=1.3.7

Session/Room/Call IDs

No response

Proposed Solution

- Update docs to explain modedependent behavior.
- Or set last_speaking_time at EOS in VAD mode for consistent additive behavior.
- Add a metric forsilencetocommitincluding STT EOS delay.

Additional Context

No response

Screenshots and Recordings

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions