Make Silero VAD stateful across calls (carry LSTM state)

## Problem

whisper.cpp's `whisper_vad_detect_speech` resets LSTM hidden/cell states on every call (`ggml_backend_buffer_clear(vctx->buffer, 0)` at whisper.cpp:5131). This is by design for whisper.cpp's one-shot file processing use case, but means our streaming usage (calling it repeatedly with 512-sample chunks) loses temporal context between calls.

The upstream Silero VAD model is designed to be stateful — LSTM state should carry across 512-sample chunks, just like TEN-VAD carries state across 256-sample hops.

## Proposed change

Since we fork whisper.cpp, add a non-breaking way to skip the buffer clear:
- Option A: Add a `bool reset_state` parameter or flag to `whisper_vad_detect_speech`
- Option B: Add a separate `whisper_vad_reset_state()` function and remove the auto-reset from `detect_speech`
- Option C: Just remove the `ggml_backend_buffer_clear` line and let callers explicitly reset via a new function when needed

Also need to make `VadBackend.reset()` for Silero call the new reset function (currently a no-op).

## Impact

- No change to our calling code — `SileroVad.chunkProbS16` already calls once per chunk
- Probabilities should improve with temporal context
- Likely improves Silero's accuracy on our regression tests
- No upstream issue/PR exists for this (checked Feb 2026)

## Priority

Low — TEN-VAD is our default and already stateful. This only affects `--vad silero`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Silero VAD stateful across calls (carry LSTM state) #1

Problem

Proposed change

Impact

Priority

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Make Silero VAD stateful across calls (carry LSTM state) #1

Description

Problem

Proposed change

Impact

Priority

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions