On an HPC, saving to zarr is much faster than saving to binary

Hello,

I think that saving to binary isn't parralellising when I run it on the Uni of Edinburgh HPC, Eddie.  Here's the code:

```
import spikeinterface.full as si

# there's some code to get the recording paths, and `get_recording_from` returns a openephys recording object
recording = si.concatenate_recordings([get_recording_from(rec_path) for rec_path in rec_paths])
recording_filtered = si.common_reference(si.bandpass_filter(recording, freq_min=300, freq_max=6000))

si.set_global_job_kwargs(n_jobs=8)
recording_filtered.save_to_zarr(folder="/exports/eddie/scratch/chalcrow/harry_project/temp_zarr")
recording_filtered.save_to_folder(folder="/exports/eddie/scratch/chalcrow/harry_project/temp_binary")
```

This takes in an ~hour long recording, and then saves it twice. The output is:

```
write_zarr_recording
n_jobs=8 - samples_per_chunk=30,000 - chunk_memory=43.95 MiB - total_memory=351.56 MiB - chunk_duration=1.00s
write_zarr_recording: 100%|██████████| 4203/4203 [10:30<00:00,  6.67it/s]

write_binary_recording
n_jobs=8 - samples_per_chunk=30,000 - chunk_memory=43.95 MiB - total_memory=351.56 MiB - chunk_duration=1.00s
write_binary_recording: 100%|██████████| 4203/4203 [1:21:15<00:00,  1.16s/it]
```

So write_binary_recording takes 8x as long so it looks like it's not been parallelised. But weirdly, the code reports that it's using `n_jobs=8` for both cases. Passing `n_jobs` directly to `save_to_folder` doesn't seem to help.

On my personal computer, there is no real difference in the timings between these methods.

Any ideas? Are the parallelisation scehems difference for the different file formats?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

On an HPC, saving to zarr is much faster than saving to binary #3630

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

On an HPC, saving to zarr is much faster than saving to binary #3630

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions