Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in whiten.py, eps is too large for some datasets #2064

Closed
magland opened this issue Oct 3, 2023 · 11 comments · Fixed by #2070
Closed

in whiten.py, eps is too large for some datasets #2064

magland opened this issue Oct 3, 2023 · 11 comments · Fixed by #2070
Labels
preprocessing Related to preprocessing module

Comments

@magland
Copy link
Member

magland commented Oct 3, 2023

When the magnitude of the recording traces is very low (e.g. ~ 1e-6 or lower), the eps used in the whiten preprocessing is too large. This results in a whitened signal that is not normalized properly, and detection for mountainsort (and perhaps other algorithms) then fails.

W, M = compute_whitening_matrix(
recording, mode, random_chunk_kwargs, apply_mean, radius_um=radius_um, eps=1e-8
)

Here is a dataset for which the raw traces are float and low magnitude
https://dandiarchive.org/dandiset/000463/draft/files?location=sub-BH395

The simplest fix is to change this to 1e-16. Not sure if this will introduce any other issues.

@alejoe91
Copy link
Member

alejoe91 commented Oct 3, 2023

Are the traces in volte? Another option is to scale them up with a spre.scale() function to convert them to uV

@samuelgarcia
Copy link
Member

Not sure this is good to have epsilon too low.
We could at leat expose it it in whiten function.

@magland
Copy link
Member Author

magland commented Oct 3, 2023

Thanks @alejoe91 Alessio.

I have added a note to the docs here:
https://github.com/flatironinstitute/mountainsort5

that says:

# Note that if the recording traces are of float type, you may need to scale
# it to a reasonable voltage range in order for whitening to work properly
# recording = spre.scale(recording, gain=...)

But it's unfortunate that this is necessary, because I need to include that blurb for every example, and it's not always easy for users to know whether they need a scale factor. So it would be great if you could consider changing this to 1e-16 unless you forsee any trouble with that.

@magland
Copy link
Member Author

magland commented Oct 3, 2023

Not sure this is good to have epsilon too low. We could at leat expose it it in whiten function.

The eps could also be automatically calculated from the data. For example, 1e-6 * 1 / median_abs_traces

@alejoe91
Copy link
Member

alejoe91 commented Oct 4, 2023

I think that you could also do it in MS5 directly, right? When it comes to preprocessing, in SpikeInterface we generally assume that you're either dealing with non-scaled int16 or uV traces, both of which will work with 1e-6. My worry is that 1e-16 could be too low as an epsilon. Could it?

@alejoe91 alejoe91 added the preprocessing Related to preprocessing module label Oct 4, 2023
@magland
Copy link
Member Author

magland commented Oct 4, 2023

I think that you could also do it in MS5 directly, right?

It wouldn't quite work, because whitening will have already failed in more ways than just incorrect scaling of the channels.

When it comes to preprocessing, in SpikeInterface we generally assume that you're either dealing with non-scaled int16 or uV traces, both of which will work with 1e-6.

Yeah, that makes sense.

My worry is that 1e-16 could be too low as an epsilon. Could it?

Possibly too low, yeah. I think the best solution could be to choose epsilon based on the data. Something like

# proposal
median_sqr = np.median(data ** 2)
eps = min(1-e6, max(1e-16, median_sqr * 1e-6))

For typical voltage range examples, this would end up being the same 1-e6

@alejoe91
Copy link
Member

alejoe91 commented Oct 4, 2023

Instead of median we should probably use the mad or std no? Median could very well be 0 and say nothing about the range. Or even a ptp or percentiles

@alejoe91
Copy link
Member

alejoe91 commented Oct 4, 2023

But yeah I think your solution would work well!

@magland
Copy link
Member Author

magland commented Oct 4, 2023

Instead of median we should probably use the mad or std no? Median could very well be 0 and say nothing about the range. Or even a ptp or percentiles

It's the median of the square, so it would be strictly greater than zero. I chose the square because cov (and hence S) scales as the square of the data.

@alejoe91
Copy link
Member

alejoe91 commented Oct 4, 2023

Ah sorry I missed the square! Makes sense then. Can you make a PR?

@magland
Copy link
Member Author

magland commented Oct 4, 2023

Made PR: #2070

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preprocessing Related to preprocessing module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants