Open
Description
Hello,
I think that saving to binary isn't parralellising when I run it on the Uni of Edinburgh HPC, Eddie. Here's the code:
import spikeinterface.full as si
# there's some code to get the recording paths, and `get_recording_from` returns a openephys recording object
recording = si.concatenate_recordings([get_recording_from(rec_path) for rec_path in rec_paths])
recording_filtered = si.common_reference(si.bandpass_filter(recording, freq_min=300, freq_max=6000))
si.set_global_job_kwargs(n_jobs=8)
recording_filtered.save_to_zarr(folder="/exports/eddie/scratch/chalcrow/harry_project/temp_zarr")
recording_filtered.save_to_folder(folder="/exports/eddie/scratch/chalcrow/harry_project/temp_binary")
This takes in an ~hour long recording, and then saves it twice. The output is:
write_zarr_recording
n_jobs=8 - samples_per_chunk=30,000 - chunk_memory=43.95 MiB - total_memory=351.56 MiB - chunk_duration=1.00s
write_zarr_recording: 100%|██████████| 4203/4203 [10:30<00:00, 6.67it/s]
write_binary_recording
n_jobs=8 - samples_per_chunk=30,000 - chunk_memory=43.95 MiB - total_memory=351.56 MiB - chunk_duration=1.00s
write_binary_recording: 100%|██████████| 4203/4203 [1:21:15<00:00, 1.16s/it]
So write_binary_recording takes 8x as long so it looks like it's not been parallelised. But weirdly, the code reports that it's using n_jobs=8
for both cases. Passing n_jobs
directly to save_to_folder
doesn't seem to help.
On my personal computer, there is no real difference in the timings between these methods.
Any ideas? Are the parallelisation scehems difference for the different file formats?