Skip to content

Commit 1d5a104

Browse files
committed
Add SpikeInterface implementations for all modules
1 parent 7e567a6 commit 1d5a104

File tree

10 files changed

+304
-83
lines changed

10 files changed

+304
-83
lines changed

README.md

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,16 +11,30 @@ Modules for processing **e**xtra**c**ellular **e**lectro**phys**iology data from
1111

1212
## Overview
1313

14-
The first three modules take data saved by the [Open Ephys GUI](https://github.com/open-ephys/plugin-gui) and prepare it for spike sorting by [Kilosort2](https://github.com/MouseLand/Kilosort2). Following the spike-sorting step (using the [kilosort_helper](ecephys_spike_sorting/modules/kilosort_helper/README.md) module), we clean up the outputs and calculate mean waveforms and quality metrics for each unit.
14+
This repository contains code used by the Allen Institute to run spike sorting pipelines from the Allen Brain Observatory. Public datasets that have used `ecephys_spike_sorting` include [**Visual Coding - Neuropixels**](https://allensdk.readthedocs.io/en/latest/visual_coding_neuropixels.html) and [**Visual Behavior - Neuropixels**](https://allensdk.readthedocs.io/en/latest/visual_behavior_neuropixels.html). The code has been used to process data for a number of publications, including:
15+
16+
- Siegle, Jia et al. (2021) [Survey of spiking in the mouse visual system reveals functional hierarchy.](https://doi.org/10.1038/s41586-020-03171-x)
17+
18+
- Siegle, Ledochowitsch et al. (2021) [Reconciling functional differences in populations of neurons recorded with two-photon imaging and electrophysiology.](https://doi.org/10.7554/eLife.69068)
19+
20+
- Jia et al. (2022) [Multi-regional module-based signal transmission in mouse visual cortex.](https://doi.org/10.1016/j.neuron.2022.01.027)
1521

16-
This code is still under development, and we welcome feedback about any step in the pipeline.
22+
## Compatibility
1723

18-
Further documentation can be found in each module's README file. For more information on Kilosort2, please read through the [GitHub wiki](https://github.com/MouseLand/Kilosort2/wiki).
24+
This code is designed to ingest data collected with the [Open Ephys GUI](https://open-ephys.org/gui). [@jenniferColonell](https://github.com/jenniferColonell) from HHMI Janelia Research Campus [maintains a fork](https://github.com/jenniferColonell/ecephys_spike_sorting) that is compatible with data recorded by [SpikeGLX](https://billkarsh.github.io/SpikeGLX/). For the spike sorting step, both versions rely on Kilosort 2 or 2.5. For more information on Kilosort, please read through the [GitHub wiki](https://github.com/MouseLand/Kilosort/wiki).
25+
26+
27+
## Level of Support
1928

29+
This repository is **no longer under development**, and we recommend that new users base their spike sorting pipelines on [SpikeInterface](https://spikeinterface.readthedocs.io/en/latest/) instead. Even existing `ecephys_spike_sorting` users would benefit from migrating to SpikeInterface. The Allen Institute has already converted most of its spike sorting workflows to use SpikeInterface, which is actively maintained, works with a range of modern spike sorters, and includes up-to-date implementations of the most important pre- and post-processing methods. The SpikeInterface syntax needed to reproduce the functionality of `ecephys_spike_sorting` can be found in each module's README file.
30+
31+
To get started with SpikeInterface, we recommend reading through [this tutorial on analyzing Neuropixels data](https://spikeinterface.readthedocs.io/en/latest/how_to/analyse_neuropixels.html).
2032

2133
## Modules
2234

23-
1. [extract_from_npx](ecephys_spike_sorting/modules/extract_from_npx/README.md): Calls a binary executable that converts data from compressed NPX format into .dat files (continuous data) and .npy files (event data)
35+
The first three modules take data saved by the [Open Ephys GUI](https://github.com/open-ephys/plugin-gui) and prepare it for spike sorting by [Kilosort2](https://github.com/MouseLand/Kilosort2). Following the spike-sorting step (using the [kilosort_helper](ecephys_spike_sorting/modules/kilosort_helper/README.md) module), we clean up the outputs and calculate mean waveforms and quality metrics for each unit.
36+
37+
1. [extract_from_npx](ecephys_spike_sorting/modules/extract_from_npx/README.md) (*deprecated*): Calls a binary executable that converts data from compressed NPX format into .dat files (continuous data) and .npy files (event data). The NPX format is no longer used by Open Ephys (or any other software), so this module can be skipped.
2438

2539
2. [depth_estimation](ecephys_spike_sorting/modules/depth_estimation/README.md): Uses the LFP data to identify the surface channel, which is required by the median subtraction and kilosort modules.
2640

@@ -117,10 +131,6 @@ To leave the pipenv virtual environment, simply type:
117131
(.venv) $ exit
118132
```
119133

120-
## Level of Support
121-
122-
This code is an important part of the internal Allen Institute code base and we are actively using and maintaining it. The implementation is not yet finalized, so we welcome feedback about any aspects of the software. If you'd like to submit changes to this repository, we encourage you to create an issue beforehand, so we know what others are working on.
123-
124134

125135
## Terms of Use
126136

Lines changed: 37 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,39 @@
1-
Automerging
2-
==============
3-
Looks for clusters that likely belong to the same cell, and merges them automatically.
1+
# Automerging
42

5-
This is not currently part of our pipeline since switching to Kilosort2, but we're keeping the code around in case others find it useful. For example, it could be helpful for matching units across a series of chronic recordings.
3+
Searches for clusters that likely belong to the same cell, and merges them automatically.
64

5+
This module has not been used since switching to Kilosort 2, which has far fewer split units than Kilosort 1. We're keeping the code around in case others find it useful. For example, it could be helpful for matching units across a series of chronic recordings. However, much more complete implementations of this functionality exist elsewhere ([UnitMatch](https://github.com/EnnyvanBeest/UnitMatch), for example).
6+
7+
8+
### SpikeInterface implementation
9+
10+
SpikeInterface does not currently include the ability to automatically merge units, but this is under active development. Information that is helpful for making merge decisions, such as waveform similarity and cross-correlograms, can be computed using the `postprocessing` module:
11+
12+
```python
13+
import spikeinterface.full as si
14+
15+
from spikeinterface.postprocessing import (compute_template_similarity,
16+
compute_correlograms)
17+
18+
# run a sorter and extract waveforms
19+
# note that this omits some important pre-processing steps for brevity
20+
recording = si.read_openephys('/path/to/data')
21+
sorting = si.run_sorter('kilosort2_5', recording)
22+
waveform_extractor = si.extract_waveforms(recording=recording,
23+
sorting=sorting,
24+
folder='waveforms')
25+
26+
# run post-processing steps
27+
_ = compute_template_similarity(waveform_extractor)
28+
_ = compute_correlograms(waveform_extractor)
29+
30+
```
31+
32+
More information can be found in the documentation for the [Curation module](https://spikeinterface.readthedocs.io/en/latest/modules/curation.html).
33+
34+
35+
## Running
736

8-
Running
9-
-------
1037
```
1138
python -m ecephys_spike_sorting.modules.automerging --input_json <path to input json> --output_json <path to output json>
1239
```
@@ -16,12 +43,12 @@ Two arguments must be included:
1643

1744
See the `_schemas.py` file for detailed information about the contents of the input JSON.
1845

19-
Input data
20-
----------
46+
## Input data
47+
2148
- **Kilosort outputs** : includes spike times, spike clusters, cluster quality, etc.
2249

2350

24-
Output data
25-
-----------
51+
## Output data
52+
2653
- **spike_clusters.npy** : updated with new cluster labels
2754
- **cluster_group.tsv** : updated with new cluster labels
Lines changed: 26 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,31 @@
1-
Depth Estimation
2-
==============
1+
# Depth Estimation
2+
33
Creates a JSON file with information about the DC offset of each channel, as well as the channel closest to the brain surface. This information is needed to perform the median subtraction step.
44

5-
Implementation
6-
--------------
5+
### SpikeInterface implementation
6+
7+
`detect_bad_channels()` can be used to detect which channels are outside the brain, as well as channels that have abnormally high levels of noise.
8+
9+
This function returns both the `bad_channel_ids` and `channel_labels`, which can be `good`, `noise`, `dead`, or `out` (outside of the brain). These can then be removed from the recording so they are ignored by the spike sorter:
10+
11+
```python
12+
from spikeinterface.preprocessing import detect_bad_channels
13+
14+
bad_channel_ids, channel_labels = detect_bad_channels(recording)
15+
rec_clean = recording.remove_channels(remove_channel_ids=bad_channel_ids)
16+
17+
```
18+
19+
More information can be found in the documentation for the [Preprocessing module](https://spikeinterface.readthedocs.io/en/latest/modules/preprocessing.html).
20+
21+
## Method
22+
723
![Depth estimation](images/probe_depth.png "Surface estimation method")
824

925
This module uses the sharp increase in low-frequency LFP band power to estimate the brain surface location.
1026

11-
Running
12-
-------
27+
## Running
28+
1329
```
1430
python -m ecephys_spike_sorting.modules.depth_estimation --input_json <path to input json> --output_json <path to output json>
1531
```
@@ -19,12 +35,12 @@ Two arguments must be included:
1935

2036
See the `_schemas.py` file for detailed information about the contents of the input JSON.
2137

22-
Input data
23-
----------
38+
## Input data
39+
2440
- **AP band and LFP band .dat or .bin files** : int16 binary files written by [Open Ephys](https://github.com/open-ephys/plugin-GUI), [SpikeGLX](https://github.com/billkarsh/spikeglx), or the `extract_from_npx` module.
2541

2642

27-
Output data
28-
-----------
43+
## Output data
44+
2945
- **probe_info.json** : contains information about each channel, as well as the surface channel for the probe
3046
- **probe_depth.png** : image showing the estimated surface channel location

ecephys_spike_sorting/modules/extract_from_npx/README.md

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,19 @@
1-
Extract from NPX
2-
==============
1+
# Extract from NPX (*deprecated*)
2+
33
Converts continuous data from raw NPX/NPX2 format (75% compression ratio) to .dat files required for spike sorting and other downstream analysis.
44

55
Reads event times from the NPX/NPX2 file and writes them as .npy files.
66

77
Converts the settings.xml file for an experiment into a JSON file with parameters such as sample rate and bit volts for each channel.
88

9-
Dependencies
10-
-------------
9+
**Note:** The NPX format is no longer used by Open Ephys (or any other software), so this module can safely be skipped.
10+
11+
## Dependencies
12+
1113
The NpxExtractor executable (Windows only) can be found in the `NpxExtractor\Release` folder.
1214

13-
Running
14-
-------
15+
## Running
16+
1517
```
1618
python -m ecephys_spike_sorting.modules.extract_from_npx --input_json <path to input json> --output_json <path to output json>
1719
```
@@ -21,14 +23,14 @@ Two arguments must be included:
2123

2224
See the `_schemas.py` file for detailed information about the contents of the input JSON.
2325

24-
Input data
25-
----------
26+
## Input data
27+
2628
- **NPX file** : Written by Open Ephys (https://github.com/open-ephys/plugin-GUI). Contains all of the data recorded from one or more Neuropixels probes.
2729
- **settings.xml** : Written by Open Ephys. Contains information about the signal chain that was used for the experiment.
2830

2931

30-
Output data
31-
-----------
32+
## Output data
33+
3234
- **continuous.dat** : Continuous data (1 file each for LFP and AP band)
3335
- **lfp_timestamps.npy** : Timestamps for LFP samples
3436
- **ap_timestamps.npy** : Timestamps for AP samples
Lines changed: 28 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,34 @@
1-
Kilosort Helper
2-
==============
1+
## Kilosort Helper
2+
33
Python wrapper for Matlab-based spike sorting with Kilosort.
44

55
This module auto-generates the channel map, configuration file, and master file for Kilosort, and runs everything via the Matlab engine for Python.
66

7-
Dependencies
8-
------------
7+
### SpikeInterface implementation
8+
9+
SpikeInterface makes it much easier to run the spike sorting step, which only requires a single line of code. We recommend running Kilosort in a [Docker container](https://spikeinterface.readthedocs.io/en/latest/modules/sorters.html#running-sorters-in-docker-singularity-containers) to avoid the need for a Matlab license or complex installation procedures.
10+
11+
After you've installed Docker, you can run Kilosort on a pre-loaded and pre-processed `Recording` object by running:
12+
13+
```python
14+
import spikeinterface.full as si
15+
16+
sorting = run_sorter(sorter_name='kilosort2_5',
17+
recording=recording,
18+
output_folder="/tmp/kilosort",
19+
docker_image=True)
20+
21+
```
22+
23+
More information can be found in the documentation for the [Sorters module](https://spikeinterface.readthedocs.io/en/latest/modules/sorters.html).
24+
25+
## Dependencies
26+
927
Kilosort [v1](https://github.com/cortex-lab/Kilosort), [v2, v2.5, or v3](https://github.com/MouseLand/kilosort) - requires Matlab >=R2016b with Signal Processing and Parallel Computing Toolboxes, Visual Studio Community 2013, and a CUDA-compatible GPU
1028
[Matlab Engine API for Python](https://www.mathworks.com/help/matlab/matlab_external/install-the-matlab-engine-for-python.html) - this may restrict the Python version you're able to use
1129

12-
Running
13-
-------
30+
## Running
31+
1432
```
1533
python -m ecephys_spike_sorting.modules.kilosort_helper --input_json <path to input json> --output_json <path to output json>
1634
```
@@ -20,10 +38,10 @@ Two arguments must be included:
2038

2139
See the `_schemas.py` file for detailed information about the contents of the input JSON.
2240

23-
Input data
24-
----------
41+
## Input data
42+
2543
- **AP band .dat or .bin file** : int16 binary files written by [Open Ephys](https://github.com/open-ephys/plugin-GUI), [SpikeGLX](https://github.com/billkarsh/spikeglx), or the `extract_from_npx` module.
2644

27-
Output data
28-
-----------
45+
## Output data
46+
2947
- **Kilosort output files** : .npy files containing spike times, cluster labels, templates, etc.
Lines changed: 35 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
Kilosort Post-Processing
2-
==============
1+
# Kilosort Post-Processing
2+
33
Clean up Kilosort outputs by removing putative double-counted spikes.
44

55
Kilosort occasionally fits a spike template to the residual of another spike. See [this discussion](https://github.com/MouseLand/Kilosort2/issues/29) for more information.
@@ -8,8 +8,35 @@ This module aims to correct for this by removing spikes from the same unit or ne
88

99
We are not currently taking into account spike amplitude when removing spikes; the module just deletes one spike from an overlapping pair that occurs later in time.
1010

11-
Running
12-
-------
11+
### SpikeInterface implementation
12+
13+
There is not currently a function for removing putative double-counted spikes with SpikeInterface. Instead, you can use the `export_to_phy()` method to save the data in a format that can be loaded by this module:
14+
15+
```python
16+
import spikeinterface.full as si
17+
18+
from spikeinterface.postprocessing import (compute_spike_amplitudes,
19+
compute_principal_components)
20+
21+
from spikeinterface.exporters import export_to_phy
22+
23+
# the waveforms are sparse so it is faster to export to phy
24+
we = si.extract_waveforms(recording=recording, sorting=sorting, folder='waveforms')
25+
26+
# compute some metrics needed for this module:
27+
_ = compute_spike_amplitudes(waveform_extractor=we)
28+
_ = compute_principal_components(waveform_extractor=we,
29+
n_components=3,
30+
mode='by_channel_global')
31+
32+
# save the data in a specified location
33+
export_to_phy(waveform_extractor=we,
34+
output_folder='path/to/phy_folder')
35+
36+
```
37+
38+
## Running
39+
1340
```
1441
python -m ecephys_spike_sorting.modules.kilosort_postprocessing --input_json <path to input json> --output_json <path to output json>
1542
```
@@ -19,10 +46,10 @@ Two arguments must be included:
1946

2047
See the `_schemas.py` file for detailed information about the contents of the input JSON.
2148

22-
Input data
23-
----------
49+
## Input data
50+
2451
- **Kilosort output files** : .npy files containing spike times, cluster labels, templates, etc.
2552

26-
Output data
27-
-----------
53+
## Output data
54+
2855
- **Updated Kilosort output files** : overwrites .npy files for spike times, cluster labels, amplitudes, and PC features. The original outputs can be extracted from the `rez.mat` file if necessary.

ecephys_spike_sorting/modules/mean_waveforms/README.md

Lines changed: 29 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
Mean Waveforms
2-
==============
1+
# Mean Waveforms
2+
33
Extracts mean waveforms from raw data, given spike times and cluster IDs.
44

55
Computes waveforms separately for individual epochs, as well as for the entire experiment. If no epochs are specified, waveforms are selected randomly from the entire recording. Waveform standard deviation is currently computed, but not saved.
@@ -20,9 +20,30 @@ Metrics are computed for every waveform, and include features of the 1D peak-cha
2020

2121
Source: [Jia et al. (2019) "High-density extracellular probes reveal dendritic backpropagation and facilitate neuron classification." _J Neurophys_ **121**: 1831-1847](https://doi.org/10.1152/jn.00680.2018)
2222

23+
### SpikeInterface implementation
24+
25+
SpikeInterface uses a `WaveformExtractor` object to pull spike waveforms out of the raw data and compute metrics on their shape.
26+
27+
Extracting the mean waveforms from a sorting and computing a variety of waveform metrics only requires two lines of code:
28+
29+
```python
30+
import spikeinterface.full as si
31+
32+
from spikeinterface.postprocessing import compute_template_metrics
33+
34+
waveform_extractor = si.extract_waveforms(recording=recording,
35+
sorting=sorting,
36+
folder='waveforms')
37+
38+
_ = compute_template_metrics(waveform_extractor)
39+
40+
```
41+
42+
More information can be found in the documentation for the [Postprocessing module](https://spikeinterface.readthedocs.io/en/latest/modules/postprocessing.html).
43+
44+
45+
## Running
2346

24-
Running
25-
-------
2647
```
2748
python -m ecephys_spike_sorting.modules.mean_waveforms --input_json <path to input json> --output_json <path to output json>
2849
```
@@ -32,13 +53,13 @@ Two arguments must be included:
3253

3354
See the `_schemas.py` file for detailed information about the contents of the input JSON.
3455

35-
Input data
36-
----------
56+
## Input data
57+
3758
- **AP band .dat or .bin file** : int16 binary files written by [Open Ephys](https://github.com/open-ephys/plugin-GUI), [SpikeGLX](https://github.com/billkarsh/spikeglx), or the `extract_from_npx` module.
3859
- **Kilosort outputs** : includes spike times, spike clusters, cluster quality, etc.
3960

4061

41-
Output data
42-
-----------
62+
## Output data
63+
4364
- **mean_waveforms.npy** : numpy file containing mean waveforms for clusters across all epochs
4465
- **waveform_metrics.csv** : CSV file containing metrics for each waveform

0 commit comments

Comments
 (0)