Skip to content

Default sample rate in OpenSoundscape, training/inference SR mismatch, and sigmoid vs softmax for binary classification #1191

@beinscig-jpg

Description

@beinscig-jpg

Hi,

I trained a CNN using OpenSoundscape without explicitly specifying the sample rate.
The original audio recordings were sampled at 22,050 Hz, and I assumed this was also the sample rate used internally during training.

However, during inference I observed that the model performs more stably and with fewer false positives when audio is resampled to 8,000 Hz, even though training was done on 22,050 Hz data.

This raises a few questions:

What is the default sample rate used by OpenSoundscape when sample_rate (or sr) is not explicitly specified?

Is there any implicit resampling step during audio loading or preprocessing (e.g. via librosa)?

How can I verify the effective sample rate actually used by a trained model (e.g. stored in the preprocessor or dataset configuration)?

For context, this is a bioacoustic detection task (wolf howls), where most relevant information lies at relatively low frequencies.
The improved performance at 8 kHz may therefore be biologically meaningful, but I would like to better understand whether this behavior is expected from OpenSoundscape’s preprocessing pipeline.

In addition, this is a binary classification problem (wolf vs. not-wolf).
I would like to ask whether, in OpenSoundscape, it is preferable to use a sigmoid output with binary cross-entropy or a softmax output with categorical cross-entropy for this type of task, and whether this choice has any interaction with probability calibration or thresholding during inference.

Thanks in advance for any clarification.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions