-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Hi,
I trained a CNN using OpenSoundscape without explicitly specifying the sample rate.
The original audio recordings were sampled at 22,050 Hz, and I assumed this was also the sample rate used internally during training.
However, during inference I observed that the model performs more stably and with fewer false positives when audio is resampled to 8,000 Hz, even though training was done on 22,050 Hz data.
This raises a few questions:
What is the default sample rate used by OpenSoundscape when sample_rate (or sr) is not explicitly specified?
Is there any implicit resampling step during audio loading or preprocessing (e.g. via librosa)?
How can I verify the effective sample rate actually used by a trained model (e.g. stored in the preprocessor or dataset configuration)?
For context, this is a bioacoustic detection task (wolf howls), where most relevant information lies at relatively low frequencies.
The improved performance at 8 kHz may therefore be biologically meaningful, but I would like to better understand whether this behavior is expected from OpenSoundscape’s preprocessing pipeline.
In addition, this is a binary classification problem (wolf vs. not-wolf).
I would like to ask whether, in OpenSoundscape, it is preferable to use a sigmoid output with binary cross-entropy or a softmax output with categorical cross-entropy for this type of task, and whether this choice has any interaction with probability calibration or thresholding during inference.
Thanks in advance for any clarification.