Default sample rate in OpenSoundscape, training/inference SR mismatch, and sigmoid vs softmax for binary classification #1192

beinscig-jpg · 2025-12-14T06:39:33Z

beinscig-jpg
Dec 14, 2025

Hi,

I trained a CNN using OpenSoundscape without explicitly specifying the sample rate.
The original audio recordings were sampled at 22,050 Hz, and I assumed this was also the sample rate used internally during training.

However, during inference I observed that the model performs more stably and with fewer false positives when audio is resampled to 8,000 Hz, even though training was done on 22,050 Hz data.

This raises a few questions:

What is the default sample rate used by OpenSoundscape when sample_rate (or sr) is not explicitly specified?

Is there any implicit resampling step during audio loading or preprocessing (e.g. via librosa)?

How can I verify the effective sample rate actually used by a trained model (e.g. stored in the preprocessor or dataset configuration)?

For context, this is a bioacoustic detection task (wolf howls), where most relevant information lies at relatively low frequencies.
The improved performance at 8 kHz may therefore be biologically meaningful, but I would like to better understand whether this behavior is expected from OpenSoundscape’s preprocessing pipeline.

In addition, this is a binary classification problem (wolf vs. not-wolf).
I would like to ask whether, in OpenSoundscape, it is preferable to use a sigmoid output with binary cross-entropy or a softmax output with categorical cross-entropy for this type of task, and whether this choice has any interaction with probability calibration or thresholding during inference.

Thanks in advance for any clarification.

sammlapp · 2025-12-14T14:58:10Z

sammlapp
Dec 14, 2025
Maintainer

Moving this over to Discussions

0 replies

sammlapp · 2025-12-14T15:22:52Z

sammlapp
Dec 14, 2025
Maintainer

Hi @beinscig-jpg , TLDR: default is No resampling, keep the original audio sample rate, but crop spectrograms to 0-11025 Hz. Here is some clarification

First, I highly recommend visualizing preprocessed samples to see what the model will receive as input. Here's an example using the public Rana Sierrae dataset

root = "/path/to/rana_sierrae_2022/"
label_df = pd.read_csv(f"{root}/labels_2s.csv")
label_df["file"] = label_df["file"].apply(lambda x: f"{root}/mp3/{x}")
label_df = label_df.set_index(['file','start_time','end_time'])

m=opso.CNN('resnet18',classes=['wolf'],sample_duration=2)
# m.preprocessor.pipeline.bandpass.set(max_f=30000)
samples = m.generate_samples(
    label_df.sample(20, random_state=1),
    bypass_augmentations=True,
    raise_errors=True,
)
from opensoundscape.preprocess.utils import show_tensor_grid


_ = show_tensor_grid(
    [s.data for s in samples],
    columns=4,
    labels=[s.categorical_labels for s in samples],
)

The default preprocessing behavior of CNN() and SpectrogramPreprocessor() in OpenSoundscape <=0.12.1 does not resample audio to a consistent sample rate. (default of sample_rate = None means do not resample audio, this is chosen as a default for efficiency's sake but note that we plan to change this default behavior to be resampling all audio inputs starting in OpenSoundscape v0.13.0, which has not yet been released)

However, it does perform a default bandpassing (cropping) of spectrograms to the frequency range of 0-11025 (ie 22050/2) (

opensoundscape/opensoundscape/preprocess/preprocessors.py

Line 547 in 2f14a07

"bandpass": Action(

)

The default behavior also specifies out_of_bounds_ok=False for the Spectrogram bandpass. This means that preprocessing would fail on a sample where you have resampled the audio to 8 kHz then use the default cropping to 0-11025 Hz, since the spectrogram would only have frequencies up to 4 kHz. If your model is not failing to preprocess the resampled audio, it implies that you've either (1) added resampling back into the preprocessor's load_audio step, or (2) changed the bandpass settings.

If out_of_bounds_ok is set to True for the bandpass action, you would get a spectrogram with blank values above 4 kHz. This could indeed improve precision compared to the full spectrogram if higher non-wolf sounds were creating false positive detections. For single target models, we often find that bandpassing audio or spectrograms to species-specific frequency ranges can boost performance.

You can interactively inspect the preprocessing settings by returning model.preprocessor from a notebook cell. The relevant parameters here are load_audio > sample_rate and bandpass -> min_f, max_f, out_of_bounds_ok

To address the final question about activation layer and loss function, for a single target problem we recommend using a single class ie "wolf", with sigmoid activation. Binary cross entropy loss and cross entropy loss collapse to the same equation in the case of one class. While it's also possible to train a two-class model ("wolf", "not wolf") with softmax activation and cross entropy loss, I feel this is redundant and confusing. Neither model will be calibrated, but I suspect calibration makes more sense in the context of one class -> sigmoid activation than in the case of two classses -> softmax activation. Internally, during training, OpenSoundscape selects CE or BCE loss based on the model's .single_target=True/False property, and also selects the appropriate activation function. When running inference, we typically save the raw logit outputs (no activation function) rather than post-sigmoid 0-1 scores.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Default sample rate in OpenSoundscape, training/inference SR mismatch, and sigmoid vs softmax for binary classification #1192

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Default sample rate in OpenSoundscape, training/inference SR mismatch, and sigmoid vs softmax for binary classification #1192

Uh oh!

beinscig-jpg Dec 14, 2025

Replies: 2 comments

Uh oh!

sammlapp Dec 14, 2025 Maintainer

Uh oh!

Uh oh!

sammlapp Dec 14, 2025 Maintainer

beinscig-jpg
Dec 14, 2025

sammlapp
Dec 14, 2025
Maintainer

sammlapp
Dec 14, 2025
Maintainer