Skip to content

Silenced audio gives different results #12

Open
@rootux

Description

@rootux

Hey -
I have an issue with prediction.

I've created 3 audio files
The first one is 12 seconds recording - (with 5 seconds of silence) [See image below]
The second one is trimmed last 4 seconds of the first one.
The third one is same as first one - but without long silence in the middle.

Results - 12 seconds full audio 🤔 ❌ - WITH silence in the middle:

{
  "prediction": 0,
  "probability": 0.012071866542100906
}

Results - 4 seconds trimmed audio ✅ :

{
  "prediction": 1,
  "probability": 0.9972302317619324
}

Results - 6 seconds - full audio no silence ✅

{
  "prediction": 1,
  "probability": 0.9956890940666199
}

p.s sending the audio to whisper I get
Audio 1 - 12 seconds - "i prefer cats but please answer quickly"
Audio 2 - 4 seconds - "cat but please answer quickly"
Audio 3 - 6 seconds(No silence in the middle) - "i prefer cats but please answer quickly"

I would expect the prediction to be "1" in all cases.
Why does the prediction fails?

Link to the files
Full audio - https://github.com/rootux/smart-turn-audio/blob/main/a4_complete.ogg
Cropped audio -https://github.com/rootux/smart-turn-audio/blob/main/a4_complete_edited.ogg
Cropped audio - no silence - https://github.com/rootux/smart-turn-audio/blob/main/a4_complete_no_silence_in_middle.ogg

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions