You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Looks like timestamps get very inaccurate when VAD (silero-vad) is on. Look e.g the the before the door in the following examples. When looking at the audio file, the word the start time is timed much more accurately when VAD is on versus off. There is a whopping 1.27s difference. This test is using large-v3 model and run on Replicate: https://replicate.com/villesau/whisper-timestamped
Also, "the kite dipped and swayed but stayed aloft" is divided in different segments when VAD on vs off.
Looks like timestamps get very inaccurate when VAD (silero-vad) is on. Look e.g the
the
before thedoor
in the following examples. When looking at the audio file, the wordthe
start time is timed much more accurately when VAD is on versus off. There is a whopping 1.27s difference. This test is using large-v3 model and run on Replicate: https://replicate.com/villesau/whisper-timestampedAlso, "the kite dipped and swayed but stayed aloft" is divided in different segments when VAD on vs off.
Here is the sample audio file: https://replicate.delivery/pbxt/JrvsggK5WvFQ4Q53h4ugPbXW0LK2BLnMZm2dCPhM8bodUq5w/OSR_uk_000_0050_8k.wav
VAD on:
VAD off:
Full examples:
VAD on
VAD off
The text was updated successfully, but these errors were encountered: