-
Notifications
You must be signed in to change notification settings - Fork 31.3k
Closed
Labels
Description
System Info
transformersversion: 4.46.3- Platform: Linux-5.17.15-051715-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.28.1
- Safetensors version: 0.5.2
- Accelerate version: 1.4.0
- Accelerate config: not found
- PyTorch version (GPU?): 2.6.0+cpu (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?:
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Hello!
There is a change in timestamps processing between versions 4.46.3 and 4.47.0. With version 4.47.7 there is an empty segment for each processed audio chunk returned when return_timestamps enabled.
To reproduce the issue please run reproducer.py with transfrormers versions 4.46.3 and 4.47.0.
reproducer.py
from transformers import pipeline
import datasets
import typing
def get_sample_from_dataset():
ds = datasets.load_dataset(
"distil-whisper/meanwhile",
split="test",
streaming=True,
trust_remote_code=True,
)
ds = typing.cast(datasets.IterableDataset, ds)
ds = ds.cast_column("audio", datasets.Audio(sampling_rate=16000))
ds = ds.take(1)
return next(iter(ds))["audio"]
sample = get_sample_from_dataset()
whisper = pipeline("automatic-speech-recognition", "openai/whisper-tiny")
transcription = whisper(
sample.copy(),
return_timestamps=True,
)
print(transcription["text"])
for chunk in transcription["chunks"]:
print(chunk)
# transformers version 4.46.3
# {'timestamp': (0.0, 3.2), 'text': ' Folks, if you watch the show, you know, I spent a lot of time'}
# {'timestamp': (3.2, 4.64), 'text': ' right over there.'}
# {'timestamp': (4.64, 7.04), 'text': ' Patiently and astutely scrutinizing the boxwood and'}
# {'timestamp': (7.04, 9.28), 'text': ' mahogany chest set of the days, big stories,'}
# {'timestamp': (9.28, 11.84), 'text': ' developing the central headline pawns,'}
# {'timestamp': (11.84, 15.08), 'text': ' definitely maneuvering an OSO topical night to F6,'}
# {'timestamp': (15.08, 16.8), 'text': ' faming of classic Sicilian,'}
# {'timestamp': (16.8, 18.96), 'text': ' named or variation on the news,'}
# {'timestamp': (18.96, 21.0), 'text': ' all the while seeing eight moves deep and'}
# {'timestamp': (21.0, 24.0), 'text': ' patiently marshalling the latest press releases into a'}
# {'timestamp': (24.0, 27.52), 'text': ' Fisher shows in lip nitsky attack that culminates in the'}
# {'timestamp': (0.0, 3.24), 'text': ' The elegant lethal slow played all-pass on checkmate'}
# {'timestamp': (3.24, 5.18), 'text': ' that is my nightly monologue, but sometimes sometimes'}
# {'timestamp': (5.18, 6.0), 'text': ' folks I'}
# {'timestamp': (6.0, 9.0), 'text': ' sometimes I'}
# {'timestamp': (9.0, 13.0), 'text': ' start a little wake upside down in the monkey bars'}
# {'timestamp': (13.0, 15.48), 'text': ' of a condemned playground on a super fun site.'}
# {'timestamp': (15.48, 17.52), 'text': ' Get all hepped up on goofballs, rummage that were'}
# {'timestamp': (17.52, 20.32), 'text': ' discarded tag bag of defective toys.'}
# {'timestamp': (20.32, 23.4), 'text': ' Yank out a fistball of disembodied doll limbs,'}
# {'timestamp': (23.4, 24.96), 'text': " toss them on a stained kid's place,"}
# {'timestamp': (24.96, 27.98), 'text': ' mad from a defunct denies, set up a table inside a rusty'}
# {'timestamp': (27.98, 29.72), 'text': ' cargo container down by the warf,'}
# {'timestamp': (0.0, 2.28), 'text': ' and challenged toothless drifters to the godless,'}
# {'timestamp': (2.28, 5.76), 'text': ' bug house blitz of tournament that is my segment.'}
# {'timestamp': (5.76, 9.56), 'text': ' Me and Wild.'}
# transformers version 4.47.0
# {'timestamp': (0.0, 3.2), 'text': ' Folks, if you watch the show, you know, I spent a lot of time'}
# {'timestamp': (3.2, 4.64), 'text': ' right over there.'}
# {'timestamp': (4.64, 7.04), 'text': ' Patiently and astutely scrutinizing the boxwood and'}
# {'timestamp': (7.04, 9.28), 'text': ' mahogany chest set of the days, big stories,'}
# {'timestamp': (9.28, 11.84), 'text': ' developing the central headline pawns,'}
# {'timestamp': (11.84, 15.08), 'text': ' definitely maneuvering an OSO topical night to F6,'}
# {'timestamp': (15.08, 16.8), 'text': ' faming of classic Sicilian,'}
# {'timestamp': (16.8, 18.96), 'text': ' named or variation on the news,'}
# {'timestamp': (18.96, 21.0), 'text': ' all the while seeing eight moves deep and'}
# {'timestamp': (21.0, 24.0), 'text': ' patiently marshalling the latest press releases into a'}
# {'timestamp': (24.0, 27.52), 'text': ' Fisher shows in lip nitsky attack that culminates in the'}
# {'timestamp': (27.52, 0.0), 'text': ''}
# {'timestamp': (3.24, 5.18), 'text': ' The elegant lethal slow played all-pass on checkmate that is my nightly monologue, but sometimes sometimes'}
# {'timestamp': (5.18, 6.0), 'text': ' folks I'}
# {'timestamp': (6.0, 9.0), 'text': ' sometimes I'}
# {'timestamp': (9.0, 13.0), 'text': ' start a little wake upside down in the monkey bars'}
# {'timestamp': (13.0, 15.48), 'text': ' of a condemned playground on a super fun site.'}
# {'timestamp': (15.48, 17.52), 'text': ' Get all hepped up on goofballs, rummage that were'}
# {'timestamp': (17.52, 20.32), 'text': ' discarded tag bag of defective toys.'}
# {'timestamp': (20.32, 23.4), 'text': ' Yank out a fistball of disembodied doll limbs,'}
# {'timestamp': (23.4, 24.96), 'text': " toss them on a stained kid's place,"}
# {'timestamp': (24.96, 27.98), 'text': ' mad from a defunct denies, set up a table inside a rusty'}
# {'timestamp': (27.98, 29.72), 'text': ' cargo container down by the warf,'}
# {'timestamp': (29.72, 0.0), 'text': ''}
# {'timestamp': (2.28, 5.76), 'text': ' and challenged toothless drifters to the godless, bug house blitz of tournament that is my segment.'}
# {'timestamp': (5.76, 9.56), 'text': ' Me and Wild.'}Expected behavior
It looks like the empty segment is unnecessary and should not be returned.