Silenced audio gives different results

Hey - 
I have an issue with prediction.

I've created 3 audio files
The first one is 12 seconds recording - (with 5 seconds of silence) [See image below]
The second one is trimmed last 4 seconds of the first one.
The third one is same as first one - but without long silence in the middle.

Results - 12 seconds full audio 🤔 ❌  - WITH silence in the middle:
```
{
  "prediction": 0,
  "probability": 0.012071866542100906
}
```

Results - 4 seconds trimmed audio ✅ :
```
{
  "prediction": 1,
  "probability": 0.9972302317619324
}
```

Results - 6 seconds - full audio no silence ✅ 
```
{
  "prediction": 1,
  "probability": 0.9956890940666199
}
```

p.s sending the audio to whisper I get
Audio 1 - 12 seconds - "i prefer cats but please answer quickly"
Audio 2 - 4 seconds - "cat but please answer quickly" 
Audio 3 - 6 seconds(No silence in the middle) - "i prefer cats but please answer quickly" 


I would expect the prediction to be "1" in all cases.
Why does the prediction fails?

**Link to the files**
Full audio - https://github.com/rootux/smart-turn-audio/blob/main/a4_complete.ogg
Cropped audio -https://github.com/rootux/smart-turn-audio/blob/main/a4_complete_edited.ogg
Cropped audio - no silence - https://github.com/rootux/smart-turn-audio/blob/main/a4_complete_no_silence_in_middle.ogg

![Image](https://github.com/user-attachments/assets/ced730a3-a215-4a58-a0ed-569017369d97)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Silenced audio gives different results #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Silenced audio gives different results #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions