Fix whisper `return_language` with `return_timestamp=word` #39938

Metric-Void · 2025-08-05T22:42:52Z

What does this PR do?

Add a switch to Whisper.generate() that allows preserving some special tokens, then stripped in retrieve_segments to ensure timestamp alignment.

Tested on short and long audios. Tested on English, French, and Cantonese. Prediction and timestamp results align, and language is detected correctly.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@eustlb @ebezzam

Local failed tests (WSL2, RUN_SLOW)

$ pytest tests/models/whisper
================================================================================================================= short test summary info ==================================================================================================================
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_flex_attention_with_grads - torch._inductor.exc.InductorError: LoweringException: CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpw7mv95z8/main.c', '-O3', '-shared', '-fPIC', '-Wno-psabi', '-o', '/tmp/tmpw7mv95z8/cuda_utils.cpython-312-x86_64-linux-gnu.so', '-lcuda', ...
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperModelTest::test_sdpa_can_compile_dynamic - torch._inductor.exc.InductorError: CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmp2jbthzzq/main.c', '-O3', '-shared', '-fPIC', '-Wno-psabi', '-o', '/tmp/tmp2jbthzzq/cuda_utils.cpython-312-x86_64-linux-gnu.so', '-lcuda', '-L/home/metricvoid...
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperEncoderModelTest::test_flex_attention_with_grads - torch._inductor.exc.InductorError: LoweringException: CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpbszajy61/main.c', '-O3', '-shared', '-fPIC', '-Wno-psabi', '-o', '/tmp/tmpbszajy61/cuda_utils.cpython-312-x86_64-linux-gnu.so', '-lcuda', ...
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperEncoderModelTest::test_sdpa_can_compile_dynamic - torch._inductor.exc.InductorError: CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpevr_eml0/main.c', '-O3', '-shared', '-fPIC', '-Wno-psabi', '-o', '/tmp/tmpevr_eml0/cuda_utils.cpython-312-x86_64-linux-gnu.so', '-lcuda', '-L/home/metricvoid...
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_generate_compilation_all_outputs - torch._inductor.exc.InductorError: CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpghb4htrw/main.c', '-O3', '-shared', '-fPIC', '-Wno-psabi', '-o', '/tmp/tmpghb4htrw/cuda_utils.cpython-312-x86_64-linux-gnu.so', '-lcuda', '-L/home/metricvoid...
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_generate_compile_model_forward - torch._inductor.exc.InductorError: CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpb3fj6t8c/main.c', '-O3', '-shared', '-fPIC', '-Wno-psabi', '-o', '/tmp/tmpb3fj6t8c/cuda_utils.cpython-312-x86_64-linux-gnu.so', '-lcuda', '-L/home/metricvoid...
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_generate_from_inputs_embeds_with_static_cache - torch._inductor.exc.InductorError: CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmp122w6v5o/main.c', '-O3', '-shared', '-fPIC', '-Wno-psabi', '-o', '/tmp/tmp122w6v5o/cuda_utils.cpython-312-x86_64-linux-gnu.so', '-lcuda', '-L/home/metricvoid...
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_generate_with_static_cache - torch._inductor.exc.InductorError: CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpee6hyznt/main.c', '-O3', '-shared', '-fPIC', '-Wno-psabi', '-o', '/tmp/tmpee6hyznt/cuda_utils.cpython-312-x86_64-linux-gnu.so', '-lcuda', '-L/home/metricvoid...
FAILED tests/models/whisper/test_modeling_whisper.py::WhisperStandaloneDecoderModelTest::test_sdpa_can_compile_dynamic - torch._inductor.exc.InductorError: CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpbz2lnr80/main.c', '-O3', '-shared', '-fPIC', '-Wno-psabi', '-o', '/tmp/tmpbz2lnr80/cuda_utils.cpython-312-x86_64-linux-gnu.so', '-lcuda', '-L/home/metricvoid...
FAILED tests/models/whisper/test_tokenization_whisper.py::WhisperTokenizerTest::test_padding_side_in_kwargs - ImportError: 
FAILED tests/models/whisper/test_tokenization_whisper.py::WhisperTokenizerTest::test_tokenizer_initialization_with_conflicting_key - ImportError: 
FAILED tests/models/whisper/test_tokenization_whisper.py::WhisperTokenizerTest::test_tokenizer_mismatch_warning - ImportError: 
FAILED tests/models/whisper/test_tokenization_whisper.py::WhisperTokenizerTest::test_truncation_side_in_kwargs - ImportError: 
=========================================================================================== 13 failed, 445 passed, 295 skipped, 36 warnings in 166.72s (0:02:46) ===========================================================================================

I don't think any of these failures are related to this PR.

github-actions · 2025-08-07T15:46:57Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: whisper

Metric-Void and others added 3 commits August 5, 2025 17:47

Whisper token language fix

7db5705

Style; Avoid negative timestamps with incorrect subtraction.

a07d091

Merge branch 'main' into whisper-langtoken-fix

fcb099a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix whisper `return_language` with `return_timestamp=word` #39938

Fix whisper `return_language` with `return_timestamp=word` #39938

Uh oh!

Metric-Void commented Aug 5, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

Uh oh!

Fix whisper return_language with return_timestamp=word #39938

Are you sure you want to change the base?

Fix whisper return_language with return_timestamp=word #39938

Uh oh!

Conversation

Metric-Void commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Local failed tests (WSL2, RUN_SLOW)

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

Uh oh!

Fix whisper `return_language` with `return_timestamp=word` #39938

Fix whisper `return_language` with `return_timestamp=word` #39938

Metric-Void commented Aug 5, 2025 •

edited

Loading