Skip to content

On an HPC, saving to zarr is much faster than saving to binary #3630

Open
@chrishalcrow

Description

@chrishalcrow

Hello,

I think that saving to binary isn't parralellising when I run it on the Uni of Edinburgh HPC, Eddie. Here's the code:

import spikeinterface.full as si

# there's some code to get the recording paths, and `get_recording_from` returns a openephys recording object
recording = si.concatenate_recordings([get_recording_from(rec_path) for rec_path in rec_paths])
recording_filtered = si.common_reference(si.bandpass_filter(recording, freq_min=300, freq_max=6000))

si.set_global_job_kwargs(n_jobs=8)
recording_filtered.save_to_zarr(folder="/exports/eddie/scratch/chalcrow/harry_project/temp_zarr")
recording_filtered.save_to_folder(folder="/exports/eddie/scratch/chalcrow/harry_project/temp_binary")

This takes in an ~hour long recording, and then saves it twice. The output is:

write_zarr_recording
n_jobs=8 - samples_per_chunk=30,000 - chunk_memory=43.95 MiB - total_memory=351.56 MiB - chunk_duration=1.00s
write_zarr_recording: 100%|██████████| 4203/4203 [10:30<00:00,  6.67it/s]

write_binary_recording
n_jobs=8 - samples_per_chunk=30,000 - chunk_memory=43.95 MiB - total_memory=351.56 MiB - chunk_duration=1.00s
write_binary_recording: 100%|██████████| 4203/4203 [1:21:15<00:00,  1.16s/it]

So write_binary_recording takes 8x as long so it looks like it's not been parallelised. But weirdly, the code reports that it's using n_jobs=8 for both cases. Passing n_jobs directly to save_to_folder doesn't seem to help.

On my personal computer, there is no real difference in the timings between these methods.

Any ideas? Are the parallelisation scehems difference for the different file formats?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingperformancePerformance issues/improvements

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions