in whiten.py, eps is too large for some datasets #2064

magland · 2023-10-03T16:24:59Z

When the magnitude of the recording traces is very low (e.g. ~ 1e-6 or lower), the eps used in the whiten preprocessing is too large. This results in a whitened signal that is not normalized properly, and detection for mountainsort (and perhaps other algorithms) then fails.

spikeinterface/src/spikeinterface/preprocessing/whiten.py

Lines 70 to 72 in ca48f6b

    
           W, M = compute_whitening_matrix( 
        
               recording, mode, random_chunk_kwargs, apply_mean, radius_um=radius_um, eps=1e-8 
        
           )

Here is a dataset for which the raw traces are float and low magnitude
https://dandiarchive.org/dandiset/000463/draft/files?location=sub-BH395

The simplest fix is to change this to 1e-16. Not sure if this will introduce any other issues.

alejoe91 · 2023-10-03T17:32:57Z

Are the traces in volte? Another option is to scale them up with a spre.scale() function to convert them to uV

samuelgarcia · 2023-10-03T18:47:32Z

Not sure this is good to have epsilon too low.
We could at leat expose it it in whiten function.

magland · 2023-10-03T18:50:08Z

Thanks @alejoe91 Alessio.

I have added a note to the docs here:
https://github.com/flatironinstitute/mountainsort5

that says:

# Note that if the recording traces are of float type, you may need to scale
# it to a reasonable voltage range in order for whitening to work properly
# recording = spre.scale(recording, gain=...)

But it's unfortunate that this is necessary, because I need to include that blurb for every example, and it's not always easy for users to know whether they need a scale factor. So it would be great if you could consider changing this to 1e-16 unless you forsee any trouble with that.

magland · 2023-10-03T18:55:22Z

Not sure this is good to have epsilon too low. We could at leat expose it it in whiten function.

The eps could also be automatically calculated from the data. For example, 1e-6 * 1 / median_abs_traces

alejoe91 · 2023-10-04T09:41:06Z

I think that you could also do it in MS5 directly, right? When it comes to preprocessing, in SpikeInterface we generally assume that you're either dealing with non-scaled int16 or uV traces, both of which will work with 1e-6. My worry is that 1e-16 could be too low as an epsilon. Could it?

magland · 2023-10-04T11:41:25Z

I think that you could also do it in MS5 directly, right?

It wouldn't quite work, because whitening will have already failed in more ways than just incorrect scaling of the channels.

When it comes to preprocessing, in SpikeInterface we generally assume that you're either dealing with non-scaled int16 or uV traces, both of which will work with 1e-6.

Yeah, that makes sense.

My worry is that 1e-16 could be too low as an epsilon. Could it?

Possibly too low, yeah. I think the best solution could be to choose epsilon based on the data. Something like

# proposal
median_sqr = np.median(data ** 2)
eps = min(1-e6, max(1e-16, median_sqr * 1e-6))

For typical voltage range examples, this would end up being the same 1-e6

alejoe91 · 2023-10-04T12:02:21Z

Instead of median we should probably use the mad or std no? Median could very well be 0 and say nothing about the range. Or even a ptp or percentiles

alejoe91 · 2023-10-04T12:02:40Z

But yeah I think your solution would work well!

magland · 2023-10-04T13:05:29Z

Instead of median we should probably use the mad or std no? Median could very well be 0 and say nothing about the range. Or even a ptp or percentiles

It's the median of the square, so it would be strictly greater than zero. I chose the square because cov (and hence S) scales as the square of the data.

alejoe91 · 2023-10-04T13:09:53Z

Ah sorry I missed the square! Makes sense then. Can you make a PR?

magland · 2023-10-04T13:53:27Z

Made PR: #2070

alejoe91 added the preprocessing Related to preprocessing module label Oct 4, 2023

magland mentioned this issue Oct 4, 2023

Adjust eps for whitening in case of very small magnitude data #2070

Merged

alejoe91 closed this as completed in #2070 Oct 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

in whiten.py, eps is too large for some datasets #2064

in whiten.py, eps is too large for some datasets #2064

magland commented Oct 3, 2023

alejoe91 commented Oct 3, 2023

samuelgarcia commented Oct 3, 2023

magland commented Oct 3, 2023

magland commented Oct 3, 2023 •

edited

Loading

alejoe91 commented Oct 4, 2023

magland commented Oct 4, 2023 •

edited

Loading

alejoe91 commented Oct 4, 2023

alejoe91 commented Oct 4, 2023

magland commented Oct 4, 2023 •

edited

Loading

alejoe91 commented Oct 4, 2023

magland commented Oct 4, 2023

in whiten.py, eps is too large for some datasets #2064

in whiten.py, eps is too large for some datasets #2064

Comments

magland commented Oct 3, 2023

alejoe91 commented Oct 3, 2023

samuelgarcia commented Oct 3, 2023

magland commented Oct 3, 2023

magland commented Oct 3, 2023 • edited Loading

alejoe91 commented Oct 4, 2023

magland commented Oct 4, 2023 • edited Loading

alejoe91 commented Oct 4, 2023

alejoe91 commented Oct 4, 2023

magland commented Oct 4, 2023 • edited Loading

alejoe91 commented Oct 4, 2023

magland commented Oct 4, 2023

magland commented Oct 3, 2023 •

edited

Loading

magland commented Oct 4, 2023 •

edited

Loading

magland commented Oct 4, 2023 •

edited

Loading