Skip to content

Add PreprocessingPipeline #3438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

chrishalcrow
Copy link
Member

@chrishalcrow chrishalcrow commented Sep 25, 2024

Add a PreprocessingPipeline class, which contains ordered preprocessing steps and their kwargs in a dictionary.

You can apply_pipeline to a recording to make a preprocessed recording:

preprocessor_dict = {'bandpass_filter': {'freq_max': 3000}, 'common_reference': {}}

from spikeinterface.preprocessing import apply_pipeline
preprocessed_recording = apply_pipeline(recording, preprocessor_dict)

Under the hood, this uses the apply method of the new PreprocessingPipeline. Users can also use the class directly:

from spikeinterface.preprocessing import PreprocessingPipeline
pipeline = PreprocessingPipeline(preprocessor_dict)
preprocessed_recording = pipeline.apply(recording, preprocessor_dict)

Also adds a function which takes in a provenance.json provenance file and makes a preprocessor_dict. So it's easy to extract preprocessing steps from a saved recording.

from spikeinterface.preprocessing import get_preprocessing_dict_from_json
my_dict = get_preprocessing_dict_from_json('/path/to/provenance.json')

After you load this, you can either apply the precomputable_kwargs or ignore them and compute on application:

# this will apply the precomputed stuff, like the `M` and `W` matrices from whitening:
pp_rec = si.apply_pipeline(rec, my_dict, ignore_precomputed_kwargs=False)
# this will ignore this stuff, and recompute the kwargs on application:
pp_rec = si.apply_pipeline(rec, my_dict, ignore_precomputed_kwargs=True)

PR allow for some cool things:

  1. Users can pass a single dictionary to construct a preprocessed recording (as above). Hence it completes the “dictionary workflow”; since you can use dicts in sorting, run_sorter, and postprocessing in compute.
  2. Users can easily visualise their preprocessing pipeline using the repr, including an HTML repr in Jupyter notebook
  3. Increases portability between labs, and should make giving advice to users easier (from us, and from spike sorting developers), since we can just say "Oh, for KS4 NP2.0 we use this dict for preprocessing".
  4. Increases the usefulness of our provenance system, since you can reconstruct human-readable preprocessing steps from the provenance.json file without the original recording (and worrying about paths).

The repr currently looks like this:
Screenshot 2025-05-20 at 15 33 45

@chrishalcrow chrishalcrow added enhancement New feature or request preprocessing Related to preprocessing module labels Sep 25, 2024
@alejoe91 alejoe91 modified the milestone: 0.101.2 Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request preprocessing Related to preprocessing module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants