Default sample rate in OpenSoundscape, training/inference SR mismatch, and sigmoid vs softmax for binary classification #1192
Replies: 2 comments
-
|
Moving this over to Discussions |
Beta Was this translation helpful? Give feedback.
-
|
Hi @beinscig-jpg , TLDR: default is No resampling, keep the original audio sample rate, but crop spectrograms to 0-11025 Hz. Here is some clarification
root = "/path/to/rana_sierrae_2022/"
label_df = pd.read_csv(f"{root}/labels_2s.csv")
label_df["file"] = label_df["file"].apply(lambda x: f"{root}/mp3/{x}")
label_df = label_df.set_index(['file','start_time','end_time'])
m=opso.CNN('resnet18',classes=['wolf'],sample_duration=2)
# m.preprocessor.pipeline.bandpass.set(max_f=30000)
samples = m.generate_samples(
label_df.sample(20, random_state=1),
bypass_augmentations=True,
raise_errors=True,
)
from opensoundscape.preprocess.utils import show_tensor_grid
_ = show_tensor_grid(
[s.data for s in samples],
columns=4,
labels=[s.categorical_labels for s in samples],
)
However, it does perform a default bandpassing (cropping) of spectrograms to the frequency range of 0-11025 (ie 22050/2) ( )The default behavior also specifies If out_of_bounds_ok is set to True for the bandpass action, you would get a spectrogram with blank values above 4 kHz. This could indeed improve precision compared to the full spectrogram if higher non-wolf sounds were creating false positive detections. For single target models, we often find that bandpassing audio or spectrograms to species-specific frequency ranges can boost performance. You can interactively inspect the preprocessing settings by returning
|
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I trained a CNN using OpenSoundscape without explicitly specifying the sample rate.
The original audio recordings were sampled at 22,050 Hz, and I assumed this was also the sample rate used internally during training.
However, during inference I observed that the model performs more stably and with fewer false positives when audio is resampled to 8,000 Hz, even though training was done on 22,050 Hz data.
This raises a few questions:
What is the default sample rate used by OpenSoundscape when sample_rate (or sr) is not explicitly specified?
Is there any implicit resampling step during audio loading or preprocessing (e.g. via librosa)?
How can I verify the effective sample rate actually used by a trained model (e.g. stored in the preprocessor or dataset configuration)?
For context, this is a bioacoustic detection task (wolf howls), where most relevant information lies at relatively low frequencies.
The improved performance at 8 kHz may therefore be biologically meaningful, but I would like to better understand whether this behavior is expected from OpenSoundscape’s preprocessing pipeline.
In addition, this is a binary classification problem (wolf vs. not-wolf).
I would like to ask whether, in OpenSoundscape, it is preferable to use a sigmoid output with binary cross-entropy or a softmax output with categorical cross-entropy for this type of task, and whether this choice has any interaction with probability calibration or thresholding during inference.
Thanks in advance for any clarification.
Beta Was this translation helpful? Give feedback.
All reactions