Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large timestamp & segment discrepancies with VAD on & off #216

Open
villesau opened this issue Oct 5, 2024 · 0 comments
Open

Large timestamp & segment discrepancies with VAD on & off #216

villesau opened this issue Oct 5, 2024 · 0 comments

Comments

@villesau
Copy link
Contributor

villesau commented Oct 5, 2024

Looks like timestamps get very inaccurate when VAD (silero-vad) is on. Look e.g the the before the door in the following examples. When looking at the audio file, the word the start time is timed much more accurately when VAD is on versus off. There is a whopping 1.27s difference. This test is using large-v3 model and run on Replicate: https://replicate.com/villesau/whisper-timestamped

Also, "the kite dipped and swayed but stayed aloft" is divided in different segments when VAD on vs off.

Here is the sample audio file: https://replicate.delivery/pbxt/JrvsggK5WvFQ4Q53h4ugPbXW0LK2BLnMZm2dCPhM8bodUq5w/OSR_uk_000_0050_8k.wav

VAD on:

        {
          "end": 6.89,
          "text": "the",
          "start": 6.25,
          "confidence": 0.989
        },

VAD off:

        {
          "end": 6.88,
          "text": "the",
          "start": 4.98,
          "confidence": 0.953
        },

Full examples:

VAD on
{
  "text": " the little tales they tell are false the door was barred locked and bolted as well ripe pears are fit for a queen's table a big wet stain was on the round carpet the kite dipped and swayed but stayed aloft the pleasant hours fly by much too soon the room was crowded with a mild wab the room was crowded with a wild mob this strong arm shall shield your honour she blushed when he gave her a white orchid the beetle droned in the hot june sun the the the",
  "language": "en",
  "segments": [
    {
      "id": 0,
      "end": 21.81,
      "seek": 0,
      "text": " the little tales they tell are false the door was barred locked and bolted as well ripe pears are fit for a queen's table a big wet stain was on the round carpet the kite dipped and swayed but stayed aloft",
      "start": 2.2,
      "words": [
        {
          "end": 2.78,
          "text": "the",
          "start": 2.2,
          "confidence": 0.849
        },
        {
          "end": 3.04,
          "text": "little",
          "start": 2.78,
          "confidence": 0.987
        },
        {
          "end": 3.52,
          "text": "tales",
          "start": 3.04,
          "confidence": 0.971
        },
        {
          "end": 3.72,
          "text": "they",
          "start": 3.52,
          "confidence": 0.998
        },
        {
          "end": 4.1,
          "text": "tell",
          "start": 3.72,
          "confidence": 0.999
        },
        {
          "end": 4.34,
          "text": "are",
          "start": 4.1,
          "confidence": 0.998
        },
        {
          "end": 4.96,
          "text": "false",
          "start": 4.34,
          "confidence": 0.999
        },
        {
          "end": 6.89,
          "text": "the",
          "start": 6.25,
          "confidence": 0.989
        },
        {
          "end": 7.17,
          "text": "door",
          "start": 6.89,
          "confidence": 0.998
        },
        {
          "end": 7.39,
          "text": "was",
          "start": 7.17,
          "confidence": 0.998
        },
        {
          "end": 7.93,
          "text": "barred",
          "start": 7.39,
          "confidence": 0.998
        },
        {
          "end": 8.41,
          "text": "locked",
          "start": 7.93,
          "confidence": 0.991
        },
        {
          "end": 8.67,
          "text": "and",
          "start": 8.41,
          "confidence": 0.996
        },
        {
          "end": 9.07,
          "text": "bolted",
          "start": 8.67,
          "confidence": 1
        },
        {
          "end": 9.25,
          "text": "as",
          "start": 9.07,
          "confidence": 0.999
        },
        {
          "end": 9.79,
          "text": "well",
          "start": 9.25,
          "confidence": 0.999
        },
        {
          "end": 11.36,
          "text": "ripe",
          "start": 10.54,
          "confidence": 0.991
        },
        {
          "end": 11.86,
          "text": "pears",
          "start": 11.36,
          "confidence": 0.996
        },
        {
          "end": 12.02,
          "text": "are",
          "start": 11.86,
          "confidence": 0.997
        },
        {
          "end": 12.18,
          "text": "fit",
          "start": 12.02,
          "confidence": 0.998
        },
        {
          "end": 12.34,
          "text": "for",
          "start": 12.18,
          "confidence": 0.999
        },
        {
          "end": 12.46,
          "text": "a",
          "start": 12.34,
          "confidence": 0.996
        },
        {
          "end": 12.8,
          "text": "queen's",
          "start": 12.46,
          "confidence": 0.993
        },
        {
          "end": 13.2,
          "text": "table",
          "start": 12.8,
          "confidence": 0.997
        },
        {
          "end": 15.3,
          "text": "a",
          "start": 14.67,
          "confidence": 0.861
        },
        {
          "end": 15.56,
          "text": "big",
          "start": 15.3,
          "confidence": 0.998
        },
        {
          "end": 15.78,
          "text": "wet",
          "start": 15.56,
          "confidence": 0.997
        },
        {
          "end": 16.14,
          "text": "stain",
          "start": 15.78,
          "confidence": 0.998
        },
        {
          "end": 16.32,
          "text": "was",
          "start": 16.14,
          "confidence": 0.999
        },
        {
          "end": 16.44,
          "text": "on",
          "start": 16.32,
          "confidence": 0.997
        },
        {
          "end": 16.56,
          "text": "the",
          "start": 16.44,
          "confidence": 0.997
        },
        {
          "end": 16.84,
          "text": "round",
          "start": 16.56,
          "confidence": 0.999
        },
        {
          "end": 17.36,
          "text": "carpet",
          "start": 16.84,
          "confidence": 0.999
        },
        {
          "end": 19.61,
          "text": "the",
          "start": 19.02,
          "confidence": 0.702
        },
        {
          "end": 19.91,
          "text": "kite",
          "start": 19.61,
          "confidence": 0.987
        },
        {
          "end": 20.23,
          "text": "dipped",
          "start": 19.91,
          "confidence": 0.994
        },
        {
          "end": 20.49,
          "text": "and",
          "start": 20.23,
          "confidence": 0.999
        },
        {
          "end": 20.95,
          "text": "swayed",
          "start": 20.49,
          "confidence": 0.999
        },
        {
          "end": 21.09,
          "text": "but",
          "start": 20.95,
          "confidence": 0.999
        },
        {
          "end": 21.39,
          "text": "stayed",
          "start": 21.09,
          "confidence": 0.987
        },
        {
          "end": 21.81,
          "text": "aloft",
          "start": 21.39,
          "confidence": 0.998
        }
      ],
      "tokens": [
        50365,
        264,
        707,
        27254,
        436,
        980,
        366,
        7908,
        264,
        2853,
        390,
        2159,
        986,
        9376,
        293,
        13436,
        292,
        382,
        731,
        31421,
        520,
        685,
        366,
        3318,
        337,
        257,
        12206,
        311,
        3199,
        257,
        955,
        6630,
        16441,
        390,
        322,
        264,
        3098,
        18119,
        264,
        38867,
        45162,
        293,
        27555,
        292,
        457,
        9181,
        419,
        6750,
        51204
      ],
      "confidence": 0.982,
      "avg_logprob": -0.047642969617656634,
      "temperature": 0,
      "no_speech_prob": 0.009554896503686905,
      "compression_ratio": 1.5185185185185186
    },
    {
      "id": 1,
      "end": 48.71,
      "seek": 1678,
      "text": " the pleasant hours fly by much too soon the room was crowded with a mild wab the room was crowded with a wild mob this strong arm shall shield your honour she blushed when he gave her a white orchid the beetle droned in the hot june sun",
      "start": 23.46,
      "words": [
        {
          "end": 24.06,
          "text": "the",
          "start": 23.46,
          "confidence": 0.99
        },
        {
          "end": 24.38,
          "text": "pleasant",
          "start": 24.06,
          "confidence": 0.994
        },
        {
          "end": 24.78,
          "text": "hours",
          "start": 24.38,
          "confidence": 0.998
        },
        {
          "end": 25.12,
          "text": "fly",
          "start": 24.78,
          "confidence": 0.995
        },
        {
          "end": 25.44,
          "text": "by",
          "start": 25.12,
          "confidence": 0.999
        },
        {
          "end": 25.78,
          "text": "much",
          "start": 25.44,
          "confidence": 0.992
        },
        {
          "end": 26,
          "text": "too",
          "start": 25.78,
          "confidence": 0.998
        },
        {
          "end": 26.64,
          "text": "soon",
          "start": 26,
          "confidence": 1
        },
        {
          "end": 28.95,
          "text": "the",
          "start": 28.33,
          "confidence": 0.986
        },
        {
          "end": 29.19,
          "text": "room",
          "start": 28.95,
          "confidence": 0.999
        },
        {
          "end": 29.39,
          "text": "was",
          "start": 29.19,
          "confidence": 0.999
        },
        {
          "end": 29.81,
          "text": "crowded",
          "start": 29.39,
          "confidence": 0.998
        },
        {
          "end": 29.99,
          "text": "with",
          "start": 29.81,
          "confidence": 0.999
        },
        {
          "end": 30.13,
          "text": "a",
          "start": 29.99,
          "confidence": 0.996
        },
        {
          "end": 30.45,
          "text": "mild",
          "start": 30.13,
          "confidence": 0.983
        },
        {
          "end": 31.09,
          "text": "wab",
          "start": 30.45,
          "confidence": 0.918
        },
        {
          "end": 33.13,
          "text": "the",
          "start": 32.53,
          "confidence": 0.954
        },
        {
          "end": 33.41,
          "text": "room",
          "start": 33.13,
          "confidence": 0.999
        },
        {
          "end": 33.63,
          "text": "was",
          "start": 33.41,
          "confidence": 1
        },
        {
          "end": 34.01,
          "text": "crowded",
          "start": 33.63,
          "confidence": 0.999
        },
        {
          "end": 34.25,
          "text": "with",
          "start": 34.01,
          "confidence": 0.999
        },
        {
          "end": 34.39,
          "text": "a",
          "start": 34.25,
          "confidence": 0.996
        },
        {
          "end": 34.71,
          "text": "wild",
          "start": 34.39,
          "confidence": 0.998
        },
        {
          "end": 35.25,
          "text": "mob",
          "start": 34.71,
          "confidence": 0.997
        },
        {
          "end": 37.37,
          "text": "this",
          "start": 36.69,
          "confidence": 0.992
        },
        {
          "end": 37.73,
          "text": "strong",
          "start": 37.37,
          "confidence": 0.998
        },
        {
          "end": 38.15,
          "text": "arm",
          "start": 37.73,
          "confidence": 0.983
        },
        {
          "end": 38.39,
          "text": "shall",
          "start": 38.15,
          "confidence": 0.996
        },
        {
          "end": 38.77,
          "text": "shield",
          "start": 38.39,
          "confidence": 0.999
        },
        {
          "end": 38.97,
          "text": "your",
          "start": 38.77,
          "confidence": 0.993
        },
        {
          "end": 39.37,
          "text": "honour",
          "start": 38.97,
          "confidence": 0.692
        },
        {
          "end": 39.89,
          "text": "she",
          "start": 39.37,
          "confidence": 0.996
        },
        {
          "end": 42.47,
          "text": "blushed",
          "start": 41.97,
          "confidence": 0.997
        },
        {
          "end": 42.61,
          "text": "when",
          "start": 42.47,
          "confidence": 0.998
        },
        {
          "end": 42.75,
          "text": "he",
          "start": 42.61,
          "confidence": 0.999
        },
        {
          "end": 42.97,
          "text": "gave",
          "start": 42.75,
          "confidence": 0.999
        },
        {
          "end": 43.17,
          "text": "her",
          "start": 42.97,
          "confidence": 0.998
        },
        {
          "end": 43.35,
          "text": "a",
          "start": 43.17,
          "confidence": 0.994
        },
        {
          "end": 43.53,
          "text": "white",
          "start": 43.35,
          "confidence": 0.998
        },
        {
          "end": 44.31,
          "text": "orchid",
          "start": 43.53,
          "confidence": 0.996
        },
        {
          "end": 46.43,
          "text": "the",
          "start": 45.84,
          "confidence": 0.997
        },
        {
          "end": 46.79,
          "text": "beetle",
          "start": 46.43,
          "confidence": 0.971
        },
        {
          "end": 47.25,
          "text": "droned",
          "start": 46.79,
          "confidence": 0.996
        },
        {
          "end": 47.43,
          "text": "in",
          "start": 47.25,
          "confidence": 0.996
        },
        {
          "end": 47.55,
          "text": "the",
          "start": 47.43,
          "confidence": 0.998
        },
        {
          "end": 47.79,
          "text": "hot",
          "start": 47.55,
          "confidence": 0.999
        },
        {
          "end": 48.19,
          "text": "june",
          "start": 47.79,
          "confidence": 0.997
        },
        {
          "end": 48.71,
          "text": "sun",
          "start": 48.19,
          "confidence": 0.985
        }
      ],
      "tokens": [
        50365,
        264,
        16232,
        2496,
        3603,
        538,
        709,
        886,
        2321,
        264,
        1808,
        390,
        21634,
        365,
        257,
        15154,
        261,
        455,
        264,
        1808,
        390,
        21634,
        365,
        257,
        4868,
        4298,
        341,
        2068,
        3726,
        4393,
        10257,
        428,
        20631,
        750,
        25218,
        292,
        562,
        415,
        2729,
        720,
        257,
        2418,
        34850,
        327,
        264,
        49735,
        1224,
        19009,
        294,
        264,
        2368,
        361,
        2613,
        3295,
        51323
      ],
      "confidence": 0.985,
      "avg_logprob": -0.09794281853569878,
      "temperature": 0,
      "no_speech_prob": 4.85025623220281e-7,
      "compression_ratio": 5.815789473684211
    },
    {
      "id": 2,
      "end": 49.27,
      "seek": 1678,
      "text": " the",
      "start": 49.25,
      "words": [
        {
          "end": 49.27,
          "text": "the",
          "start": 49.25,
          "confidence": 0.147
        }
      ],
      "tokens": [
        51323,
        264,
        51521
      ],
      "confidence": 0.147,
      "avg_logprob": -0.09794281853569878,
      "temperature": 0,
      "no_speech_prob": 4.85025623220281e-7,
      "compression_ratio": 5.815789473684211
    },
    {
      "id": 3,
      "end": 56.23,
      "seek": 1678,
      "text": " the",
      "start": 52.59,
      "words": [
        {
          "end": 56.23,
          "text": "the",
          "start": 52.59,
          "confidence": 0.367
        }
      ],
      "tokens": [
        51521,
        264,
        51817
      ],
      "confidence": 0.367,
      "avg_logprob": -0.09794281853569878,
      "temperature": 0,
      "no_speech_prob": 4.85025623220281e-7,
      "compression_ratio": 5.815789473684211
    },
    {
      "id": 4,
      "end": 59.17,
      "seek": 1678,
      "text": " the",
      "start": 58.51,
      "words": [
        {
          "end": 59.17,
          "text": "the",
          "start": 58.51,
          "confidence": 0.688
        }
      ],
      "tokens": [
        51817,
        264,
        51865
      ],
      "confidence": 0.688,
      "avg_logprob": -0.09794281853569878,
      "temperature": 0,
      "no_speech_prob": 4.85025623220281e-7,
      "compression_ratio": 5.815789473684211
    }
  ],
  "language_probs": {
    "af": 5.1902077302656835e-8,
    "am": 2.446166325054122e-10,
    "ar": 0.000005635599336528685,
    "as": 9.37711353010684e-10,
    "az": 2.303140433923545e-8,
    "ba": 9.653014498844925e-12,
    "be": 6.91528212470871e-9,
    "bg": 5.7675215714425576e-8,
    "bn": 1.1248234699223758e-7,
    "bo": 2.7027795557188483e-9,
    "br": 1.8617787134189712e-7,
    "bs": 1.2279761207878437e-8,
    "ca": 5.7123247643176e-7,
    "cs": 0.0000011629715572780697,
    "cy": 0.00009311087342211977,
    "da": 0.000001199888288283546,
    "de": 0.000030704708478879184,
    "el": 0.000001540687094347959,
    "en": 0.999530553817749,
    "es": 0.00008411867020186037,
    "et": 2.206273563842842e-8,
    "eu": 1.4356495547929171e-8,
    "fa": 3.4109851299035654e-7,
    "fi": 0.0000012187838365207426,
    "fo": 5.91495430413147e-9,
    "fr": 0.00004196836016490124,
    "gl": 2.589490577520337e-8,
    "gu": 2.1717441178736863e-9,
    "ha": 4.0010633695075626e-11,
    "he": 4.3118890857840597e-7,
    "hi": 6.574845770046522e-7,
    "hr": 6.586716239098678e-8,
    "ht": 2.599625403831851e-8,
    "hu": 0.0000015527709820162272,
    "hy": 1.671900662358894e-8,
    "id": 0.0000032872160318220267,
    "is": 1.5314573431624012e-7,
    "it": 0.000026676714696804993,
    "ja": 0.000014732278941664845,
    "jw": 5.366232471715193e-7,
    "ka": 3.770242174017113e-10,
    "kk": 8.021867614615985e-9,
    "km": 0.000001614626171431155,
    "kn": 2.389857645113125e-9,
    "ko": 0.000008460036951873917,
    "la": 0.000009968358426704071,
    "lb": 5.3212937528579474e-11,
    "ln": 9.600396172482206e-11,
    "lo": 1.3044284541408047e-9,
    "lt": 7.492939602116166e-8,
    "lv": 9.69658344729396e-8,
    "mg": 1.0643257108977622e-11,
    "mi": 9.641373708291212e-7,
    "mk": 8.89542994819692e-10,
    "ml": 2.783964703212405e-7,
    "mn": 2.84913381776164e-9,
    "mr": 1.49283945205525e-8,
    "ms": 0.0000010755729817901738,
    "mt": 3.573614515417489e-9,
    "my": 7.47720285687592e-9,
    "ne": 8.741735335604517e-9,
    "nl": 0.000012601185517269187,
    "nn": 0.000009364406651002355,
    "no": 0.00000220693232222402,
    "oc": 2.6094104654816874e-9,
    "pa": 4.616266124912727e-8,
    "pl": 0.000006797830792493187,
    "ps": 2.3574060481479364e-9,
    "pt": 0.000020453746401472017,
    "ro": 0.0000015649494571334799,
    "ru": 0.00002029457209573593,
    "sa": 4.422112098723119e-8,
    "sd": 2.6401694164235323e-9,
    "si": 4.847986474487698e-7,
    "sk": 8.133509510344084e-8,
    "sl": 2.594940440303617e-7,
    "sn": 2.456840206832567e-7,
    "so": 1.0008183518039893e-11,
    "sq": 2.0645218867798576e-8,
    "sr": 3.6157390415070267e-9,
    "su": 8.63942181683619e-11,
    "sv": 0.000008199748663173523,
    "sw": 2.1512813930257835e-7,
    "ta": 3.1793874200047867e-7,
    "te": 9.69658344729396e-8,
    "tg": 1.8588510938832847e-11,
    "th": 0.000001665880063228542,
    "tk": 6.871803473473825e-12,
    "tl": 0.0000010755729817901738,
    "tr": 0.0000026004017854575068,
    "tt": 1.5156691873796646e-11,
    "uk": 3.630974561019684e-7,
    "ur": 1.5739109926471428e-7,
    "uz": 2.7764594009299648e-12,
    "vi": 0.000030945528123993427,
    "yi": 1.038106756112711e-8,
    "yo": 4.9719144357140976e-8,
    "zh": 0.000015081644960446283,
    "haw": 0.0000011360315284036915,
    "yue": 9.509034981647346e-8
  },
  "speech_activity": [
    {
      "end": 5.33,
      "start": 2.062
    },
    {
      "end": 10.066,
      "start": 6.254
    },
    {
      "end": 13.714,
      "start": 10.542
    },
    {
      "end": 17.874,
      "start": 14.67
    },
    {
      "end": 22.322,
      "start": 19.022
    },
    {
      "end": 26.962,
      "start": 23.438
    },
    {
      "end": 31.346,
      "start": 28.334
    },
    {
      "end": 35.634,
      "start": 32.526
    },
    {
      "end": 39.89,
      "start": 36.686
    },
    {
      "end": 44.562,
      "start": 41.518
    },
    {
      "end": 49.138,
      "start": 45.838
    }
  ]
}
VAD off
{
  "text": " the little tales they tell are false the door was barred locked and bolted as well ripe pears are fit for a queen's table a big wet stain was on the round carpet the kite dipped and swayed but stayed aloft the pleasant hours fly by much too soon the room was crowded with a mild wab the room was crowded with a wild mob this strong arm shall shield your honour she blushed when he gave her a white orchid the beetle droned in the hot june sun the beetle droned in the hot june sun",
  "language": "en",
  "segments": [
    {
      "id": 0,
      "end": 17.28,
      "seek": 0,
      "text": " the little tales they tell are false the door was barred locked and bolted as well ripe pears are fit for a queen's table a big wet stain was on the round carpet",
      "start": 0.14,
      "words": [
        {
          "end": 2.76,
          "text": "the",
          "start": 0.14,
          "confidence": 0.633
        },
        {
          "end": 3.04,
          "text": "little",
          "start": 2.76,
          "confidence": 0.983
        },
        {
          "end": 3.5,
          "text": "tales",
          "start": 3.04,
          "confidence": 0.954
        },
        {
          "end": 3.72,
          "text": "they",
          "start": 3.5,
          "confidence": 0.996
        },
        {
          "end": 4.08,
          "text": "tell",
          "start": 3.72,
          "confidence": 0.998
        },
        {
          "end": 4.34,
          "text": "are",
          "start": 4.08,
          "confidence": 0.995
        },
        {
          "end": 4.98,
          "text": "false",
          "start": 4.34,
          "confidence": 0.999
        },
        {
          "end": 6.88,
          "text": "the",
          "start": 4.98,
          "confidence": 0.953
        },
        {
          "end": 7.16,
          "text": "door",
          "start": 6.88,
          "confidence": 0.997
        },
        {
          "end": 7.4,
          "text": "was",
          "start": 7.16,
          "confidence": 0.998
        },
        {
          "end": 7.9,
          "text": "barred",
          "start": 7.4,
          "confidence": 0.996
        },
        {
          "end": 8.4,
          "text": "locked",
          "start": 7.9,
          "confidence": 0.99
        },
        {
          "end": 8.68,
          "text": "and",
          "start": 8.4,
          "confidence": 0.997
        },
        {
          "end": 9.08,
          "text": "bolted",
          "start": 8.68,
          "confidence": 0.999
        },
        {
          "end": 9.24,
          "text": "as",
          "start": 9.08,
          "confidence": 0.998
        },
        {
          "end": 9.82,
          "text": "well",
          "start": 9.24,
          "confidence": 0.993
        },
        {
          "end": 11.36,
          "text": "ripe",
          "start": 9.82,
          "confidence": 0.938
        },
        {
          "end": 11.82,
          "text": "pears",
          "start": 11.36,
          "confidence": 0.983
        },
        {
          "end": 12.02,
          "text": "are",
          "start": 11.82,
          "confidence": 0.987
        },
        {
          "end": 12.18,
          "text": "fit",
          "start": 12.02,
          "confidence": 0.996
        },
        {
          "end": 12.34,
          "text": "for",
          "start": 12.18,
          "confidence": 0.998
        },
        {
          "end": 12.48,
          "text": "a",
          "start": 12.34,
          "confidence": 0.996
        },
        {
          "end": 12.8,
          "text": "queen's",
          "start": 12.48,
          "confidence": 0.989
        },
        {
          "end": 13.4,
          "text": "table",
          "start": 12.8,
          "confidence": 0.998
        },
        {
          "end": 15.28,
          "text": "a",
          "start": 13.4,
          "confidence": 0.757
        },
        {
          "end": 15.58,
          "text": "big",
          "start": 15.28,
          "confidence": 0.997
        },
        {
          "end": 15.8,
          "text": "wet",
          "start": 15.58,
          "confidence": 0.996
        },
        {
          "end": 16.14,
          "text": "stain",
          "start": 15.8,
          "confidence": 0.999
        },
        {
          "end": 16.32,
          "text": "was",
          "start": 16.14,
          "confidence": 0.999
        },
        {
          "end": 16.44,
          "text": "on",
          "start": 16.32,
          "confidence": 0.997
        },
        {
          "end": 16.58,
          "text": "the",
          "start": 16.44,
          "confidence": 0.997
        },
        {
          "end": 16.82,
          "text": "round",
          "start": 16.58,
          "confidence": 0.999
        },
        {
          "end": 17.28,
          "text": "carpet",
          "start": 16.82,
          "confidence": 0.998
        }
      ],
      "tokens": [
        50365,
        264,
        707,
        27254,
        436,
        980,
        366,
        7908,
        264,
        2853,
        390,
        2159,
        986,
        9376,
        293,
        13436,
        292,
        382,
        731,
        31421,
        520,
        685,
        366,
        3318,
        337,
        257,
        12206,
        311,
        3199,
        257,
        955,
        6630,
        16441,
        390,
        322,
        264,
        3098,
        18119,
        51295
      ],
      "confidence": 0.972,
      "avg_logprob": -0.060511426227848705,
      "temperature": 0,
      "no_speech_prob": 0.05821088328957558,
      "compression_ratio": 1.412280701754386
    },
    {
      "id": 1,
      "end": 30.86,
      "seek": 1860,
      "text": " the kite dipped and swayed but stayed aloft the pleasant hours fly by much too soon the room was crowded with a mild wab",
      "start": 18.6,
      "words": [
        {
          "end": 19.62,
          "text": "the",
          "start": 18.6,
          "confidence": 0.991
        },
        {
          "end": 19.9,
          "text": "kite",
          "start": 19.62,
          "confidence": 0.989
        },
        {
          "end": 20.24,
          "text": "dipped",
          "start": 19.9,
          "confidence": 0.996
        },
        {
          "end": 20.42,
          "text": "and",
          "start": 20.24,
          "confidence": 0.999
        },
        {
          "end": 20.94,
          "text": "swayed",
          "start": 20.42,
          "confidence": 0.999
        },
        {
          "end": 21.06,
          "text": "but",
          "start": 20.94,
          "confidence": 0.998
        },
        {
          "end": 21.36,
          "text": "stayed",
          "start": 21.06,
          "confidence": 0.987
        },
        {
          "end": 22.44,
          "text": "aloft",
          "start": 21.36,
          "confidence": 0.998
        },
        {
          "end": 24.06,
          "text": "the",
          "start": 22.44,
          "confidence": 0.997
        },
        {
          "end": 24.38,
          "text": "pleasant",
          "start": 24.06,
          "confidence": 0.999
        },
        {
          "end": 24.78,
          "text": "hours",
          "start": 24.38,
          "confidence": 0.999
        },
        {
          "end": 25.12,
          "text": "fly",
          "start": 24.78,
          "confidence": 0.997
        },
        {
          "end": 25.44,
          "text": "by",
          "start": 25.12,
          "confidence": 0.999
        },
        {
          "end": 25.78,
          "text": "much",
          "start": 25.44,
          "confidence": 0.994
        },
        {
          "end": 26,
          "text": "too",
          "start": 25.78,
          "confidence": 0.998
        },
        {
          "end": 26.82,
          "text": "soon",
          "start": 26,
          "confidence": 1
        },
        {
          "end": 28.96,
          "text": "the",
          "start": 26.82,
          "confidence": 0.975
        },
        {
          "end": 29.2,
          "text": "room",
          "start": 28.96,
          "confidence": 0.999
        },
        {
          "end": 29.4,
          "text": "was",
          "start": 29.2,
          "confidence": 0.999
        },
        {
          "end": 29.8,
          "text": "crowded",
          "start": 29.4,
          "confidence": 0.998
        },
        {
          "end": 30,
          "text": "with",
          "start": 29.8,
          "confidence": 0.999
        },
        {
          "end": 30.12,
          "text": "a",
          "start": 30,
          "confidence": 0.999
        },
        {
          "end": 30.44,
          "text": "mild",
          "start": 30.12,
          "confidence": 0.983
        },
        {
          "end": 30.86,
          "text": "wab",
          "start": 30.44,
          "confidence": 0.785
        }
      ],
      "tokens": [
        50365,
        264,
        38867,
        45162,
        293,
        27555,
        292,
        457,
        9181,
        419,
        6750,
        264,
        16232,
        2496,
        3603,
        538,
        709,
        886,
        2321,
        264,
        1808,
        390,
        21634,
        365,
        257,
        15154,
        261,
        455,
        51027
      ],
      "confidence": 0.978,
      "avg_logprob": -0.1181586674281529,
      "temperature": 0,
      "no_speech_prob": 0.0002550892240833491,
      "compression_ratio": 1.696969696969697
    },
    {
      "id": 2,
      "end": 44.16,
      "seek": 1860,
      "text": " the room was crowded with a wild mob this strong arm shall shield your honour she blushed when he gave her a white orchid",
      "start": 32.88,
      "words": [
        {
          "end": 33.14,
          "text": "the",
          "start": 32.88,
          "confidence": 0.992
        },
        {
          "end": 33.42,
          "text": "room",
          "start": 33.14,
          "confidence": 0.997
        },
        {
          "end": 33.62,
          "text": "was",
          "start": 33.42,
          "confidence": 0.999
        },
        {
          "end": 34.02,
          "text": "crowded",
          "start": 33.62,
          "confidence": 0.999
        },
        {
          "end": 34.24,
          "text": "with",
          "start": 34.02,
          "confidence": 0.999
        },
        {
          "end": 34.38,
          "text": "a",
          "start": 34.24,
          "confidence": 0.997
        },
        {
          "end": 34.74,
          "text": "wild",
          "start": 34.38,
          "confidence": 0.998
        },
        {
          "end": 35.64,
          "text": "mob",
          "start": 34.74,
          "confidence": 0.998
        },
        {
          "end": 37.36,
          "text": "this",
          "start": 35.64,
          "confidence": 0.993
        },
        {
          "end": 37.72,
          "text": "strong",
          "start": 37.36,
          "confidence": 0.999
        },
        {
          "end": 38.14,
          "text": "arm",
          "start": 37.72,
          "confidence": 0.993
        },
        {
          "end": 38.38,
          "text": "shall",
          "start": 38.14,
          "confidence": 0.997
        },
        {
          "end": 38.78,
          "text": "shield",
          "start": 38.38,
          "confidence": 0.999
        },
        {
          "end": 38.98,
          "text": "your",
          "start": 38.78,
          "confidence": 0.996
        },
        {
          "end": 39.66,
          "text": "honour",
          "start": 38.98,
          "confidence": 0.688
        },
        {
          "end": 41.96,
          "text": "she",
          "start": 39.66,
          "confidence": 0.981
        },
        {
          "end": 42.48,
          "text": "blushed",
          "start": 41.96,
          "confidence": 0.998
        },
        {
          "end": 42.62,
          "text": "when",
          "start": 42.48,
          "confidence": 0.998
        },
        {
          "end": 42.76,
          "text": "he",
          "start": 42.62,
          "confidence": 0.999
        },
        {
          "end": 42.98,
          "text": "gave",
          "start": 42.76,
          "confidence": 0.999
        },
        {
          "end": 43.16,
          "text": "her",
          "start": 42.98,
          "confidence": 0.999
        },
        {
          "end": 43.34,
          "text": "a",
          "start": 43.16,
          "confidence": 0.997
        },
        {
          "end": 43.54,
          "text": "white",
          "start": 43.34,
          "confidence": 0.998
        },
        {
          "end": 44.16,
          "text": "orchid",
          "start": 43.54,
          "confidence": 0.996
        }
      ],
      "tokens": [
        51027,
        264,
        1808,
        390,
        21634,
        365,
        257,
        4868,
        4298,
        341,
        2068,
        3726,
        4393,
        10257,
        428,
        20631,
        750,
        25218,
        292,
        562,
        415,
        2729,
        720,
        257,
        2418,
        34850,
        327,
        51695
      ],
      "confidence": 0.983,
      "avg_logprob": -0.1181586674281529,
      "temperature": 0,
      "no_speech_prob": 0.0002550892240833491,
      "compression_ratio": 1.696969696969697
    },
    {
      "id": 3,
      "end": 48.52,
      "seek": 1860,
      "text": " the beetle droned in the hot june sun",
      "start": 46.02,
      "words": [
        {
          "end": 46.44,
          "text": "the",
          "start": 46.02,
          "confidence": 0.982
        },
        {
          "end": 46.78,
          "text": "beetle",
          "start": 46.44,
          "confidence": 0.949
        },
        {
          "end": 47.26,
          "text": "droned",
          "start": 46.78,
          "confidence": 0.995
        },
        {
          "end": 47.42,
          "text": "in",
          "start": 47.26,
          "confidence": 0.997
        },
        {
          "end": 47.56,
          "text": "the",
          "start": 47.42,
          "confidence": 0.998
        },
        {
          "end": 47.78,
          "text": "hot",
          "start": 47.56,
          "confidence": 0.999
        },
        {
          "end": 48.2,
          "text": "june",
          "start": 47.78,
          "confidence": 0.991
        },
        {
          "end": 48.52,
          "text": "sun",
          "start": 48.2,
          "confidence": 0.991
        }
      ],
      "tokens": [
        51695,
        264,
        49735,
        1224,
        19009,
        294,
        264,
        2368,
        361,
        2613,
        3295,
        51865
      ],
      "confidence": 0.989,
      "avg_logprob": -0.1181586674281529,
      "temperature": 0,
      "no_speech_prob": 0.0002550892240833491,
      "compression_ratio": 1.696969696969697
    },
    {
      "id": 4,
      "end": 52.28,
      "seek": 4860,
      "text": " the beetle droned in the hot june sun",
      "start": 48.6,
      "words": [
        {
          "end": 48.8,
          "text": "the",
          "start": 48.6,
          "confidence": 0.124
        },
        {
          "end": 48.86,
          "text": "beetle",
          "start": 48.8,
          "confidence": 0.349
        },
        {
          "end": 48.96,
          "text": "droned",
          "start": 48.86,
          "confidence": 0.719
        },
        {
          "end": 48.98,
          "text": "in",
          "start": 48.96,
          "confidence": 0.979
        },
        {
          "end": 49.42,
          "text": "the",
          "start": 48.98,
          "confidence": 0.999
        },
        {
          "end": 50.32,
          "text": "hot",
          "start": 49.42,
          "confidence": 0.957
        },
        {
          "end": 52.26,
          "text": "june",
          "start": 50.32,
          "confidence": 0.999
        },
        {
          "end": 52.28,
          "text": "sun",
          "start": 52.26,
          "confidence": 0.985
        }
      ],
      "tokens": [
        50365,
        264,
        49735,
        1224,
        19009,
        294,
        264,
        2368,
        361,
        2613,
        3295,
        50555
      ],
      "confidence": 0.678,
      "avg_logprob": -0.5123335398160495,
      "temperature": 0,
      "no_speech_prob": 0.25285273790359497,
      "compression_ratio": 0.8409090909090909
    }
  ],
  "language_probs": {
    "af": 3.484968544853473e-7,
    "am": 1.0782172932266576e-9,
    "ar": 0.00003418583582970314,
    "as": 2.7212467834658582e-9,
    "az": 1.2043727792843129e-7,
    "ba": 3.840175746838703e-11,
    "be": 4.4479815386466726e-8,
    "bg": 3.1239056852427893e-7,
    "bn": 5.998010124130815e-7,
    "bo": 1.4667885572805517e-8,
    "br": 0.0000011426770925027085,
    "bs": 1.1402773480995165e-7,
    "ca": 0.000002636128556332551,
    "cs": 0.000004590545813698554,
    "cy": 0.0011232695542275906,
    "da": 0.000008848508514347486,
    "de": 0.00005725050505134277,
    "el": 0.000009058345312951133,
    "en": 0.9975538849830627,
    "es": 0.0001496611803304404,
    "et": 1.6461827101466042e-7,
    "eu": 1.1182223857986173e-7,
    "fa": 0.000001383726498715987,
    "fi": 0.000008119847734633368,
    "fo": 3.4237334745057524e-8,
    "fr": 0.00007525436376454309,
    "gl": 2.5797120883908065e-7,
    "gu": 1.1535464139456053e-8,
    "ha": 1.5367224159845705e-10,
    "he": 0.0000016821835515656858,
    "hi": 0.000005494163815455977,
    "hr": 2.432900316762243e-7,
    "ht": 1.1095203689137634e-7,
    "hu": 0.000005121123649587389,
    "hy": 7.655459199895631e-8,
    "id": 0.000014362546608026605,
    "is": 9.2536816964639e-7,
    "it": 0.00006337053491733968,
    "ja": 0.00007408765668515116,
    "jw": 0.000005002493253414286,
    "ka": 2.7722097950544367e-9,
    "kk": 3.279735949490714e-8,
    "km": 0.000011722369890776463,
    "kn": 1.0794317439888346e-8,
    "ko": 0.00003934765845770016,
    "la": 0.00005860815872438252,
    "lb": 1.348229722308858e-10,
    "ln": 5.408726244660045e-10,
    "lo": 9.107496978799645e-9,
    "lt": 2.5298160721831664e-7,
    "lv": 2.8112108907407674e-7,
    "mg": 4.334524059124156e-11,
    "mi": 0.000006423328159144148,
    "mk": 4.0771039877540716e-9,
    "ml": 0.0000014110179336057627,
    "mn": 2.1847750275583167e-8,
    "mr": 8.540283147340233e-8,
    "ms": 0.000009718185538076796,
    "mt": 1.3122551933975046e-8,
    "my": 5.139632719419751e-8,
    "ne": 7.898470499867472e-8,
    "nl": 0.00004155940769123845,
    "nn": 0.00020297066657803953,
    "no": 0.000009871225302049424,
    "oc": 1.4188863417530229e-8,
    "pa": 1.3916502439315082e-7,
    "pl": 0.000022595459086005576,
    "ps": 1.6427252447215324e-8,
    "pt": 0.000043553885916480795,
    "ro": 0.000008119847734633368,
    "ru": 0.00007351109525188804,
    "sa": 1.8873576834721462e-7,
    "sd": 1.2497194390448385e-8,
    "si": 0.0000021684202238247963,
    "sk": 3.5676120546668244e-7,
    "sl": 0.0000011975154166066204,
    "sn": 0.0000016177401676031877,
    "so": 4.225854041695065e-11,
    "sq": 7.362184106796121e-8,
    "sr": 1.0441797648752527e-8,
    "su": 3.3450572867188555e-10,
    "sv": 0.000026007213818957098,
    "sw": 9.2176054522497e-7,
    "ta": 0.000002495836497473647,
    "te": 5.110368874738924e-7,
    "tg": 8.105870141772442e-11,
    "th": 0.000010184571692661848,
    "tk": 2.8426504206091607e-11,
    "tl": 0.0000059871999837923795,
    "tr": 0.00000831240504339803,
    "tt": 6.622281889523407e-11,
    "uk": 0.0000021017049220972694,
    "ur": 0.0000010363630735810148,
    "uz": 1.2862977176453239e-11,
    "vi": 0.00012504658661782742,
    "yi": 2.7510472122571628e-8,
    "yo": 3.051540318210755e-7,
    "zh": 0.0000723714183550328,
    "haw": 0.000006473707344412105,
    "yue": 2.5999449349001225e-7
  }
}
@villesau villesau changed the title Large timestamp discrepancy with VAD on & off Large discrepancies with VAD on & off Oct 5, 2024
@villesau villesau changed the title Large discrepancies with VAD on & off Large timestamp & segment discrepancies with VAD on & off Oct 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant