Skip to content

converting spikeGLX LFP data to NWB #48

Closed
@bendichter

Description

@bendichter

It looks like for SpikeGLX data there are two ways I could implement exporting LFP data to NWB. You could

a. fetch the data from the source spikeGLX files, or
b. fetch the data from the DataJoint database.

I am currently converting from the data in the DataJoint database, but I wanted to run this by the DJ team because I think there are a few potential benefits to the other approach.

When DJ pulls data from spikeGLX, it makes several changes to the representation of the data

# Only store LFP for every 9th channel, due to high channel density,
# close-by channels exhibit highly similar LFP
_skip_channel_counts = 9
def make(self, key):
acq_software, probe_sn = (EphysRecording
* ProbeInsertion & key).fetch1('acq_software', 'probe')
electrode_keys, lfp = [], []
if acq_software == 'SpikeGLX':
spikeglx_meta_filepath = get_spikeglx_meta_filepath(key)
spikeglx_recording = spikeglx.SpikeGLX(spikeglx_meta_filepath.parent)
lfp_channel_ind = spikeglx_recording.lfmeta.recording_channels[
-1::-self._skip_channel_counts]
# Extract LFP data at specified channels and convert to uV
lfp = spikeglx_recording.lf_timeseries[:, lfp_channel_ind] # (sample x channel)
lfp = (lfp * spikeglx_recording.get_channel_bit_volts('lf')[lfp_channel_ind]).T # (channel x sample)
self.insert1(dict(key,
lfp_sampling_rate=spikeglx_recording.lfmeta.meta['imSampRate'],
lfp_time_stamps=(np.arange(lfp.shape[1])
/ spikeglx_recording.lfmeta.meta['imSampRate']),
lfp_mean=lfp.mean(axis=0)))

  1. Only every 9th channel is read and stored
  2. The conversion factor is applied to the data, transforming it from int16 into float, and putting it in units of volts
  3. timestamps are created, which are uniformly sampling the data based on the sampling rate.

If I continue with reading data from the DJ pipeline, all of these changes will be reflected in NWB. Best practice would generally be to fetch data directly from as close to the source as possible. Using the spikeGLX data as source would have several advantages here:

  1. preservation of data. I generally would not recommend sub-sampling the data to every 9th channel. I understand that lowpass filtered data generally doesn't have high spatial frequency, so it is sensible here, but I still like to preserve as much data as I can when converting to NWB and not make assumptions about what the user will want.
  2. data types. This conversion from ints to floats is inflating the size of your data. In NWB, it is better to store data as ints with a conversion factor, just like spikeGLX does, but that is not possible after the conversion to floats as been done. In general, if you are going to represent measurement data in your database, I would recommend using a strategy like NWB. Measurements with data, conversion factor, conversion offset, and units of measurement, which would provide you the option to store the data more efficiently and convert on-the-fly.
  3. sampling rate. For data that has a perfectly regular sampling rate (up to our ability to measure timing), we generally indicate that by specifying a starting time and sampling rate, and not specifying timestamps, similar to how spikeGLX does it. Storing timestamps when you could represent it the other way has two downsides: 1. The data is bigger than it needs to be and 2. you are missing the opportunity to express that this timeseries is perfectly sampled, which can be leveraged in analysis e.g. to efficiently index into the timeseries.
  4. scalability. Right now you are representing data by channel, and returning the entire session of data for each channel, with no ability to select a sub-time-region for a channel. If you had a very long session (think continuous recording for a month), even this single-channel worth of LFP could overload RAM. Converting directly from the spikeGLX format would allow us to index efficiently into the data not only by channel but also by time region. This way the conversion program would be robust to running conversion on very long sessions.

However, the task is to output data from the DJ pipeline, not for spikeGLX, and I could see how it might be preferable to convert data from DJ despite these disadvantages. What are your thoughts, @CBroz1 and @kabilar?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions