Add SpikeInterface implementations for all modules

jsiegle · jsiegle · commit 1d5a10407126 · 2023-12-19T11:58:49.000-08:00
diff --git a/README.md b/README.md
@@ -11,16 +11,30 @@ Modules for processing **e**xtra**c**ellular **e**lectro**phys**iology data from
 
 ## Overview
 
-The first three modules take data saved by the [Open Ephys GUI](https://github.com/open-ephys/plugin-gui) and prepare it for spike sorting by [Kilosort2](https://github.com/MouseLand/Kilosort2). Following the spike-sorting step (using the [kilosort_helper](ecephys_spike_sorting/modules/kilosort_helper/README.md) module), we clean up the outputs and calculate mean waveforms and quality metrics for each unit.
+This repository contains code used by the Allen Institute to run spike sorting pipelines from the Allen Brain Observatory. Public datasets that have used `ecephys_spike_sorting` include [**Visual Coding - Neuropixels**](https://allensdk.readthedocs.io/en/latest/visual_coding_neuropixels.html) and [**Visual Behavior - Neuropixels**](https://allensdk.readthedocs.io/en/latest/visual_behavior_neuropixels.html). The code has been used to process data for a number of publications, including:
+
+- Siegle, Jia et al. (2021) [Survey of spiking in the mouse visual system reveals functional hierarchy.](https://doi.org/10.1038/s41586-020-03171-x)
+
+- Siegle, Ledochowitsch et al. (2021) [Reconciling functional differences in populations of neurons recorded with two-photon imaging and electrophysiology.](https://doi.org/10.7554/eLife.69068)
+
+- Jia et al. (2022) [Multi-regional module-based signal transmission in mouse visual cortex.](https://doi.org/10.1016/j.neuron.2022.01.027)
 
-This code is still under development, and we welcome feedback about any step in the pipeline.
+## Compatibility
 
-Further documentation can be found in each module's README file. For more information on Kilosort2, please read through the [GitHub wiki](https://github.com/MouseLand/Kilosort2/wiki).
+This code is designed to ingest data collected with the [Open Ephys GUI](https://open-ephys.org/gui). [@jenniferColonell](https://github.com/jenniferColonell) from HHMI Janelia Research Campus [maintains a fork](https://github.com/jenniferColonell/ecephys_spike_sorting) that is compatible with data recorded by [SpikeGLX](https://billkarsh.github.io/SpikeGLX/). For the spike sorting step, both versions rely on Kilosort 2 or 2.5.  For more information on Kilosort, please read through the [GitHub wiki](https://github.com/MouseLand/Kilosort/wiki).
+
+
+## Level of Support
 
+This repository is **no longer under development**, and we recommend that new users base their spike sorting pipelines on [SpikeInterface](https://spikeinterface.readthedocs.io/en/latest/) instead. Even existing `ecephys_spike_sorting` users would benefit from migrating to SpikeInterface. The Allen Institute has already converted most of its spike sorting workflows to use SpikeInterface, which is actively maintained, works with a range of modern spike sorters, and includes up-to-date implementations of the most important pre- and post-processing methods. The SpikeInterface syntax needed to reproduce the functionality of `ecephys_spike_sorting` can be found in each module's README file.
+
+To get started with SpikeInterface, we recommend reading through [this tutorial on analyzing Neuropixels data](https://spikeinterface.readthedocs.io/en/latest/how_to/analyse_neuropixels.html).
 
 ## Modules
 
-1. [extract_from_npx](ecephys_spike_sorting/modules/extract_from_npx/README.md): Calls a binary executable that converts data from compressed NPX format into .dat files (continuous data) and .npy files (event data)
+The first three modules take data saved by the [Open Ephys GUI](https://github.com/open-ephys/plugin-gui) and prepare it for spike sorting by [Kilosort2](https://github.com/MouseLand/Kilosort2). Following the spike-sorting step (using the [kilosort_helper](ecephys_spike_sorting/modules/kilosort_helper/README.md) module), we clean up the outputs and calculate mean waveforms and quality metrics for each unit.
+
+1. [extract_from_npx](ecephys_spike_sorting/modules/extract_from_npx/README.md) (*deprecated*): Calls a binary executable that converts data from compressed NPX format into .dat files (continuous data) and .npy files (event data). The NPX format is no longer used by Open Ephys (or any other software), so this module can be skipped.
 
 2. [depth_estimation](ecephys_spike_sorting/modules/depth_estimation/README.md): Uses the LFP data to identify the surface channel, which is required by the median subtraction and kilosort modules.
 
@@ -117,10 +131,6 @@ To leave the pipenv virtual environment, simply type:
     (.venv) $ exit
 ```
 
-## Level of Support
-
-This code is an important part of the internal Allen Institute code base and we are actively using and maintaining it. The implementation is not yet finalized, so we welcome feedback about any aspects of the software. If you'd like to submit changes to this repository, we encourage you to create an issue beforehand, so we know what others are working on.
-
 
 ## Terms of Use
 
diff --git a/ecephys_spike_sorting/modules/automerging/README.md b/ecephys_spike_sorting/modules/automerging/README.md
@@ -1,12 +1,39 @@
-Automerging
-==============
-Looks for clusters that likely belong to the same cell, and merges them automatically.
+# Automerging
 
-This is not currently part of our pipeline since switching to Kilosort2, but we're keeping the code around in case others find it useful. For example, it could be helpful for matching units across a series of chronic recordings.
+Searches for clusters that likely belong to the same cell, and merges them automatically.
 
+This module has not been used since switching to Kilosort 2, which has far fewer split units than Kilosort 1. We're keeping the code around in case others find it useful. For example, it could be helpful for matching units across a series of chronic recordings. However, much more complete implementations of this functionality exist elsewhere ([UnitMatch](https://github.com/EnnyvanBeest/UnitMatch), for example).
+
+
+### SpikeInterface implementation
+
+SpikeInterface does not currently include the ability to automatically merge units, but this is under active development. Information that is helpful for making merge decisions, such as waveform similarity and cross-correlograms, can be computed using the `postprocessing` module:
+
+```python
+import spikeinterface.full as si
+
+from spikeinterface.postprocessing import (compute_template_similarity,
+                                           compute_correlograms)
+
+# run a sorter and extract waveforms
+# note that this omits some important pre-processing steps for brevity
+recording = si.read_openephys('/path/to/data')
+sorting = si.run_sorter('kilosort2_5', recording)
+waveform_extractor = si.extract_waveforms(recording=recording, 
+                                          sorting=sorting, 
+                                          folder='waveforms')
+
+# run post-processing steps
+_ = compute_template_similarity(waveform_extractor)
+_ = compute_correlograms(waveform_extractor)
+
+```
+
+More information can be found in the documentation for the [Curation module](https://spikeinterface.readthedocs.io/en/latest/modules/curation.html).
+
+
+## Running
 
-Running
--------
 ```
 python -m ecephys_spike_sorting.modules.automerging --input_json <path to input json> --output_json <path to output json>
 ```
@@ -16,12 +43,12 @@ Two arguments must be included:
 
 See the `_schemas.py` file for detailed information about the contents of the input JSON.
 
-Input data
-----------
+## Input data
+
 - **Kilosort outputs** : includes spike times, spike clusters, cluster quality, etc.
 
 
-Output data
------------
+## Output data
+
 - **spike_clusters.npy** : updated with new cluster labels
 - **cluster_group.tsv** : updated with new cluster labels
diff --git a/ecephys_spike_sorting/modules/depth_estimation/README.md b/ecephys_spike_sorting/modules/depth_estimation/README.md
@@ -1,15 +1,31 @@
-Depth Estimation
-==============
+# Depth Estimation
+
 Creates a JSON file with information about the DC offset of each channel, as well as the channel closest to the brain surface. This information is needed to perform the median subtraction step.
 
-Implementation
---------------
+### SpikeInterface implementation
+
+`detect_bad_channels()` can be used to detect which channels are outside the brain, as well as channels that have abnormally high levels of noise.
+
+This function returns both the `bad_channel_ids` and `channel_labels`, which can be `good`, `noise`, `dead`, or `out` (outside of the brain). These can then be removed from the recording so they are ignored by the spike sorter:
+
+```python
+from spikeinterface.preprocessing import detect_bad_channels
+
+bad_channel_ids, channel_labels = detect_bad_channels(recording)
+rec_clean = recording.remove_channels(remove_channel_ids=bad_channel_ids)
+
+```
+
+More information can be found in the documentation for the [Preprocessing module](https://spikeinterface.readthedocs.io/en/latest/modules/preprocessing.html).
+
+## Method
+
 ![Depth estimation](images/probe_depth.png "Surface estimation method")
 
 This module uses the sharp increase in low-frequency LFP band power to estimate the brain surface location.
 
-Running
--------
+## Running
+
 ```
 python -m ecephys_spike_sorting.modules.depth_estimation --input_json <path to input json> --output_json <path to output json>
 ```
@@ -19,12 +35,12 @@ Two arguments must be included:
 
 See the `_schemas.py` file for detailed information about the contents of the input JSON.
 
-Input data
-----------
+## Input data
+
 - **AP band and LFP band .dat or .bin files** : int16 binary files written by [Open Ephys](https://github.com/open-ephys/plugin-GUI), [SpikeGLX](https://github.com/billkarsh/spikeglx), or the `extract_from_npx` module.
 
 
-Output data
------------
+## Output data
+
 - **probe_info.json** : contains information about each channel, as well as the surface channel for the probe
 - **probe_depth.png** : image showing the estimated surface channel location
diff --git a/ecephys_spike_sorting/modules/extract_from_npx/README.md b/ecephys_spike_sorting/modules/extract_from_npx/README.md
@@ -1,17 +1,19 @@
-Extract from NPX
-==============
+# Extract from NPX (*deprecated*)
+
 Converts continuous data from raw NPX/NPX2 format (75% compression ratio) to .dat files required for spike sorting and other downstream analysis.
 
 Reads event times from the NPX/NPX2 file and writes them as .npy files.
 
 Converts the settings.xml file for an experiment into a JSON file with parameters such as sample rate and bit volts for each channel.
 
-Dependencies
--------------
+**Note:** The NPX format is no longer used by Open Ephys (or any other software), so this module can safely be skipped.
+
+## Dependencies
+
 The NpxExtractor executable (Windows only) can be found in the `NpxExtractor\Release` folder.
 
-Running
--------
+## Running
+
 ```
 python -m ecephys_spike_sorting.modules.extract_from_npx --input_json <path to input json> --output_json <path to output json>
 ```
@@ -21,14 +23,14 @@ Two arguments must be included:
 
 See the `_schemas.py` file for detailed information about the contents of the input JSON.
 
-Input data
-----------
+## Input data
+
 - **NPX file** : Written by Open Ephys (https://github.com/open-ephys/plugin-GUI). Contains all of the data recorded from one or more Neuropixels probes.
 - **settings.xml** : Written by Open Ephys. Contains information about the signal chain that was used for the experiment.
 
 
-Output data
------------
+## Output data
+
 - **continuous.dat** : Continuous data (1 file each for LFP and AP band)
 - **lfp_timestamps.npy** : Timestamps for LFP samples
 - **ap_timestamps.npy** : Timestamps for AP samples
diff --git a/ecephys_spike_sorting/modules/kilosort_helper/README.md b/ecephys_spike_sorting/modules/kilosort_helper/README.md
@@ -1,16 +1,34 @@
-Kilosort Helper
-==============
+## Kilosort Helper
+
 Python wrapper for Matlab-based spike sorting with Kilosort.
 
 This module auto-generates the channel map, configuration file, and master file for Kilosort, and runs everything via the Matlab engine for Python.
 
-Dependencies
-------------
+### SpikeInterface implementation
+
+SpikeInterface makes it much easier to run the spike sorting step, which only requires a single line of code. We recommend running Kilosort in a [Docker container](https://spikeinterface.readthedocs.io/en/latest/modules/sorters.html#running-sorters-in-docker-singularity-containers) to avoid the need for a Matlab license or complex installation procedures.
+
+After you've installed Docker, you can run Kilosort on a pre-loaded and pre-processed `Recording` object by running:
+
+```python
+import spikeinterface.full as si
+
+sorting = run_sorter(sorter_name='kilosort2_5', 
+                     recording=recording,
+                     output_folder="/tmp/kilosort", 
+                     docker_image=True)
+
+```
+
+More information can be found in the documentation for the [Sorters module](https://spikeinterface.readthedocs.io/en/latest/modules/sorters.html).
+
+## Dependencies
+
 Kilosort [v1](https://github.com/cortex-lab/Kilosort), [v2, v2.5, or v3](https://github.com/MouseLand/kilosort) - requires Matlab >=R2016b with Signal Processing and Parallel Computing Toolboxes, Visual Studio Community 2013, and a CUDA-compatible GPU
 [Matlab Engine API for Python](https://www.mathworks.com/help/matlab/matlab_external/install-the-matlab-engine-for-python.html) - this may restrict the Python version you're able to use
 
-Running
--------
+## Running
+
 ```
 python -m ecephys_spike_sorting.modules.kilosort_helper --input_json <path to input json> --output_json <path to output json>
 ```
@@ -20,10 +38,10 @@ Two arguments must be included:
 
 See the `_schemas.py` file for detailed information about the contents of the input JSON.
 
-Input data
-----------
+## Input data
+
 - **AP band .dat or .bin file** : int16 binary files written by [Open Ephys](https://github.com/open-ephys/plugin-GUI), [SpikeGLX](https://github.com/billkarsh/spikeglx), or the `extract_from_npx` module.
 
-Output data
------------
+## Output data
+
 - **Kilosort output files** : .npy files containing spike times, cluster labels, templates, etc.
diff --git a/ecephys_spike_sorting/modules/kilosort_postprocessing/README.md b/ecephys_spike_sorting/modules/kilosort_postprocessing/README.md
@@ -1,5 +1,5 @@
-Kilosort Post-Processing
-==============
+# Kilosort Post-Processing
+
 Clean up Kilosort outputs by removing putative double-counted spikes.
 
 Kilosort occasionally fits a spike template to the residual of another spike. See [this discussion](https://github.com/MouseLand/Kilosort2/issues/29) for more information.
@@ -8,8 +8,35 @@ This module aims to correct for this by removing spikes from the same unit or ne
 
 We are not currently taking into account spike amplitude when removing spikes; the module just deletes one spike from an overlapping pair that occurs later in time.
 
-Running
--------
+### SpikeInterface implementation
+
+There is not currently a function for removing putative double-counted spikes with SpikeInterface. Instead, you can use the `export_to_phy()` method to save the data in a format that can be loaded by this module:
+
+```python
+import spikeinterface.full as si
+
+from spikeinterface.postprocessing import (compute_spike_amplitudes,
+                                           compute_principal_components)
+
+from spikeinterface.exporters import export_to_phy
+
+# the waveforms are sparse so it is faster to export to phy
+we = si.extract_waveforms(recording=recording, sorting=sorting, folder='waveforms')
+
+# compute some metrics needed for this module:
+_ = compute_spike_amplitudes(waveform_extractor=we)
+_ = compute_principal_components(waveform_extractor=we, 
+                                 n_components=3, 
+                                 mode='by_channel_global')
+
+# save the data in a specified location
+export_to_phy(waveform_extractor=we, 
+              output_folder='path/to/phy_folder')
+
+```
+
+## Running
+
 ```
 python -m ecephys_spike_sorting.modules.kilosort_postprocessing --input_json <path to input json> --output_json <path to output json>
 ```
@@ -19,10 +46,10 @@ Two arguments must be included:
 
 See the `_schemas.py` file for detailed information about the contents of the input JSON.
 
-Input data
-----------
+## Input data
+
 - **Kilosort output files** : .npy files containing spike times, cluster labels, templates, etc.
 
-Output data
------------
+## Output data
+
 - **Updated Kilosort output files** : overwrites .npy files for spike times, cluster labels, amplitudes, and PC features. The original outputs can be extracted from the `rez.mat` file if necessary.
diff --git a/ecephys_spike_sorting/modules/mean_waveforms/README.md b/ecephys_spike_sorting/modules/mean_waveforms/README.md
@@ -1,5 +1,5 @@
-Mean Waveforms
-==============
+# Mean Waveforms
+
 Extracts mean waveforms from raw data, given spike times and cluster IDs.
 
 Computes waveforms separately for individual epochs, as well as for the entire experiment. If no epochs are specified, waveforms are selected randomly from the entire recording. Waveform standard deviation is currently computed, but not saved.
@@ -20,9 +20,30 @@ Metrics are computed for every waveform, and include features of the 1D peak-cha
 
 Source: [Jia et al. (2019) "High-density extracellular probes reveal dendritic backpropagation and facilitate neuron classification." _J Neurophys_ **121**: 1831-1847](https://doi.org/10.1152/jn.00680.2018)
 
+### SpikeInterface implementation
+
+SpikeInterface uses a `WaveformExtractor` object to pull spike waveforms out of the raw data and compute metrics on their shape.
+
+Extracting the mean waveforms from a sorting and computing a variety of waveform metrics only requires two lines of code:
+
+```python
+import spikeinterface.full as si
+
+from spikeinterface.postprocessing import compute_template_metrics
+
+waveform_extractor = si.extract_waveforms(recording=recording, 
+                                          sorting=sorting, 
+                                          folder='waveforms')
+
+_ = compute_template_metrics(waveform_extractor)
+
+```
+
+More information can be found in the documentation for the [Postprocessing module](https://spikeinterface.readthedocs.io/en/latest/modules/postprocessing.html).
+
+
+## Running
 
-Running
--------
 ```
 python -m ecephys_spike_sorting.modules.mean_waveforms --input_json <path to input json> --output_json <path to output json>
 ```
@@ -32,13 +53,13 @@ Two arguments must be included:
 
 See the `_schemas.py` file for detailed information about the contents of the input JSON.
 
-Input data
-----------
+## Input data
+
 - **AP band .dat or .bin file** : int16 binary files written by [Open Ephys](https://github.com/open-ephys/plugin-GUI), [SpikeGLX](https://github.com/billkarsh/spikeglx), or the `extract_from_npx` module.
 - **Kilosort outputs** : includes spike times, spike clusters, cluster quality, etc.
 
 
-Output data
------------
+## Output data
+
 - **mean_waveforms.npy** : numpy file containing mean waveforms for clusters across all epochs
 - **waveform_metrics.csv** : CSV file containing metrics for each waveform
diff --git a/ecephys_spike_sorting/modules/median_subtraction/README.md b/ecephys_spike_sorting/modules/median_subtraction/README.md
diff --git a/ecephys_spike_sorting/modules/noise_templates/README.md b/ecephys_spike_sorting/modules/noise_templates/README.md
diff --git a/ecephys_spike_sorting/modules/quality_metrics/README.md b/ecephys_spike_sorting/modules/quality_metrics/README.md