-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Fixed 30s timestamp resets in Whisper long-form transcription #36612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed 30s timestamp resets in Whisper long-form transcription #36612
Conversation
…rcing return_segments.
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the |
|
Hey @FaresBadrCA Thanks a lot for your PR! 🤗 |
|
Hi @eustlb, below is a snippet I used for testing, using the LinusTech dataset. Running the code twice: Once for this PR (#36612) and once for the other PR (#35750), I get the results below. PR 36612: Using
|
|
I took a look at it, and what you've spotted is actually an issue, thanks a lot for that 🙏 That is exactly why we want to go with #35750: output should be equivalent from what you get looking directly at the segments (what you're doing in this PR). That is also why this PR won't get merge: we do not want to bypass decoding directly from the outputted tokens. Anyway, thanks a lot again for spotting this issue, I added a fix for it in #35750 and will also add a test for it 😊 |
|
Closing this now for the above-mentioned reasons. |
What does this PR do?
Fixes #34210 and #31942.
This is an alternative to PR #35750
It resolves the issue of timestamps rolling over every 30 seconds in the Whisper model's long-form transcription. It does this by forcing
return_segmentsto beTruewhenreturn_timestampsisTrue.Before submitting
Who can review?
@eustlb, @Rocketknight1, @gante, @ylacombe