[Whisper] Pipeline: handle long form generation #35750

eustlb · 2025-01-17T10:54:15Z

What does this PR do?

Fixes #34210 #31942 #36602

In the tokenizer decoding logic for the pipeline, timestamp offsetting when the call to Whisper's generate have seeking (meaning generating for a new segment).

EDIT

⚠️ ❗
This also fixes another issue, indirectly spotted in #36612: when condition_on_prev_tokens=True, we need to use last generated tokens as decoder_input_ids. Nevertheless, this requires skipping one of the double ending tokens (cf #34537) to match OAI implementation, done via in place modification tokens=tokens[:-1]. But we actually need this token to be kept for further decoding (also cf #34537) !!

TODO

make sure the edge cases are correctly handled: what about chunk_length_s=60 e.g. ? → actually Whisper just should not be used with chunk_length_s set! Added a warning
add a test for above mentioned edge case → done via complexifying test_large_timestamp_generation with condition_on_prev_tokens=True

HuggingFaceDocBuilderDev · 2025-01-30T11:05:06Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

It's missing a test IMO! 🤗

ArthurZucker · 2025-01-30T13:42:18Z

src/transformers/pipelines/automatic_speech_recognition.py

+            elif self.type == "seq2seq_whisper" and not ignore_warning:
+                logger.warning(
+                    "Using `chunk_length_s` with Whisper models is not recommended and will result in unreliable results, as it uses it's own chunking mechanism "
+                    "(cf. Whisper original paper, section 3.8. Long-form Transcription)."


As I mentioned offline would be a pity to not use that batch algo in some cases! But up to debate!

True! I just want to make sure:

the user knows that for seq2seq models the pipeline's chunking mechanism is unreliable. A warning already exists for that, it is just not taking whisper into account...

ensure the user is not using the pipeline to do long-form transcription (or at least he knows he could use something more reliable when it comes to whisper) !!

I've updated the warning accordingly

FaresBadrCA · 2025-02-23T19:54:04Z

I tried the code in this PR on a sample audio. The chunk timestamps go out of sync with the audio, and it gets worse the longer the input audio is.

as-suvorov · 2025-04-25T07:43:10Z

Hi @eustlb, could you please tell what is the status of this PR?

eustlb · 2025-04-25T08:45:36Z

Hey @as-suvorov, it's waiting for a core maintainer's approval to merge. @ArthurZucker I've addressed your comment, ready to merge 🤗

ArthurZucker

thanks!

ArthurZucker · 2025-06-19T02:24:06Z

src/transformers/models/whisper/generation_whisper.py

    cut_off_length=None,
    return_token_timestamps=False,
    force_unique_generate_call=False,
+    skip_ending_double_timestamps=False,


we are missing documentation on this one no?

added a comment explaining this hidden parameters and links to related PRs to understand why we need it

* handle long form generation * add warning * correct incorrect in place token change * update test to catch edge case * make style * update warning * add doc

eustlb added 2 commits January 17, 2025 11:48

handle long form generation

1f0f005

add warning

559ed13

eustlb marked this pull request as ready for review January 17, 2025 13:52

eustlb requested review from ArthurZucker and Rocketknight1 as code owners January 17, 2025 13:52

Merge branch 'main' into fix-pipeline

c7af9d4

This was referenced Jan 17, 2025

Missing timestamp offset using Whisper with pipeline and sequential decoding #34210

Closed

Incorrect Whisper long-form decoding timestamps #31942

Closed

eustlb added 4 commits January 17, 2025 16:05

Merge branch 'main' into fix-pipeline

a868f4b

Merge branch 'main' into fix-pipeline

dd22f49

Merge branch 'main' into fix-pipeline

65b4aa7

Merge branch 'main' into fix-pipeline

0b09778

ArthurZucker reviewed Jan 30, 2025

View reviewed changes

FaresBadrCA mentioned this pull request Mar 7, 2025

Fixed 30s timestamp resets in Whisper long-form transcription #36612

Closed

4 tasks

eustlb and others added 5 commits March 12, 2025 15:09

correct incorrect in place token change

fbadefe

update test to catch edge case

41dd387

Merge branch 'main' into fix-pipeline

a918d11

make style

15071f4

update warning

954c368

eustlb mentioned this pull request Mar 13, 2025

Whisper pipeline returns empty segment for each processed audio chunk #36602

Closed

4 tasks

Merge branch 'main' into fix-pipeline

4167058

ArthurZucker approved these changes Jun 19, 2025

View reviewed changes

eustlb mentioned this pull request Jun 26, 2025

Significant WER Increase with Whisper Chunking Compared to Long-Form Transcription #38347

Closed

4 tasks

eustlb and others added 2 commits June 26, 2025 16:02

Merge branch 'main' into fix-pipeline

26f4b0f

add doc

92dd435

eustlb enabled auto-merge (squash) June 26, 2025 14:20

eustlb merged commit cfff7ca into huggingface:main Jun 26, 2025
20 checks passed

eustlb deleted the fix-pipeline branch June 26, 2025 14:36

socket-security bot mentioned this pull request Jul 1, 2025

Bump transformers from 4.52.4 to 4.53.0 alphasecio/prompt-guard#36

Closed

eustlb mentioned this pull request Jul 7, 2025

Incorrect word timestamps and word repetitions with Whisper-Large-v3-turbo model #37248

Closed

4 tasks

socket-security bot mentioned this pull request Aug 1, 2025

Bump transformers from 4.53.2 to 4.54.1 alphasecio/prompt-guard#39

Merged

socket-security bot mentioned this pull request Aug 12, 2025

[Snyk] Security upgrade transformers from 4.5.1 to 4.53.0 kingjay66/unilmf#271

Open

socket-security bot mentioned this pull request Sep 1, 2025

Bump transformers from 4.55.0 to 4.56.0 alphasecio/prompt-guard#43

Closed

This was referenced Sep 25, 2025

[Snyk] Security upgrade transformers from 4.30.2 to 4.53.0 kingjay66/unilmf#278

Open

[Snyk] Security upgrade transformers from 2.10.0 to 4.53.0 kingjay66/unilmf#279

Open

[Snyk] Security upgrade transformers from 4.5.1 to 4.53.0 kingjay66/unilmf#281

Open

socket-security bot mentioned this pull request Nov 1, 2025

Bump transformers from 4.56.2 to 4.57.1 alphasecio/prompt-guard#47

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Whisper] Pipeline: handle long form generation #35750

[Whisper] Pipeline: handle long form generation #35750

Uh oh!

eustlb commented Jan 17, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jan 30, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

ArthurZucker Jan 30, 2025

Uh oh!

eustlb Mar 12, 2025

Uh oh!

FaresBadrCA commented Feb 23, 2025

Uh oh!

as-suvorov commented Apr 25, 2025

Uh oh!

eustlb commented Apr 25, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

ArthurZucker Jun 19, 2025

Uh oh!

eustlb Jun 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Whisper] Pipeline: handle long form generation #35750

[Whisper] Pipeline: handle long form generation #35750

Uh oh!

Conversation

eustlb commented Jan 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

EDIT

TODO

Uh oh!

HuggingFaceDocBuilderDev commented Jan 30, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jan 30, 2025

Choose a reason for hiding this comment

Uh oh!

eustlb Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

FaresBadrCA commented Feb 23, 2025

Uh oh!

as-suvorov commented Apr 25, 2025

Uh oh!

eustlb commented Apr 25, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

eustlb Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

eustlb commented Jan 17, 2025 •

edited

Loading