Skip to content

Unexpected need to protect lines inside script for multiprocessing #2122

Closed
@vigji

Description

@vigji

Not urgent matter as I figured a way around the issue, reporting here to make sure I am not misunderstanding something and in case someone might need it. Also, this behavior appeared in the last month or so, so maybe it is still unreported?

I am working on a Windows machine (anaconda, python 3.10); had some troubles using multiprocessing in any possible scenario since something like this:

from pathlib import Path
from datetime import datetime
from spikeinterface.core import load_extractor

data_path = Path(r"...\test-data")
test_data = load_extractor(data_path)

temp_path = data_path.parent / f"test_dataset_resaved_{datetime.now().strftime('%Y%m%d-%H%M%S')}"
test_data.save(folder=temp_path, n_jobs=-1)

would result in (expand to show full error traceback):

RuntimeError:
    An attempt has been made to start a new process before the
    current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...
write_binary_recording with n_jobs = 20 and chunk_size = 30000
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\SNeurobiology\miniconda3\envs\ephys-env\lib\multiprocessing\spawn.py", line 116, in spawn_main
  exitcode = _main(fd, parent_sentinel)
File "C:\Users\SNeurobiology\miniconda3\envs\ephys-env\lib\multiprocessing\spawn.py", line 125, in _main
  prepare(preparation_data)
File "C:\Users\SNeurobiology\miniconda3\envs\ephys-env\lib\multiprocessing\spawn.py", line 236, in prepare
  _fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\SNeurobiology\miniconda3\envs\ephys-env\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
  main_content = runpy.run_path(main_path,
File "C:\Users\SNeurobiology\miniconda3\envs\ephys-env\lib\runpy.py", line 289, in run_path
  return _run_module_code(code, init_globals, run_name,
File "C:\Users\SNeurobiology\miniconda3\envs\ephys-env\lib\runpy.py", line 96, in _run_module_code
  _run_code(code, mod_globals, init_globals,
File "C:\Users\SNeurobiology\miniconda3\envs\ephys-env\lib\runpy.py", line 86, in _run_code
  exec(code, run_globals)
File "c:\Users\SNeurobiology\code\ephys-preprocessing\benchmarking\debug_compression.py", line 10, in <module>
  test_data.save(folder=temp_path, n_jobs=-1)
File "C:\Users\SNeurobiology\code\spikeinterface\src\spikeinterface\core\base.py", line 760, in save
  loaded_extractor = self.save_to_folder(**kwargs)
File "C:\Users\SNeurobiology\code\spikeinterface\src\spikeinterface\core\base.py", line 838, in save_to_folder
  cached = self._save(folder=folder, verbose=verbose, **save_kwargs)
File "C:\Users\SNeurobiology\code\spikeinterface\src\spikeinterface\core\baserecording.py", line 462, in _save
  write_binary_recording(self, file_paths=file_paths, dtype=dtype, **job_kwargs)
File "C:\Users\SNeurobiology\code\spikeinterface\src\spikeinterface\core\core_tools.py", line 314, in write_binary_recording
  executor.run()
File "C:\Users\SNeurobiology\code\spikeinterface\src\spikeinterface\core\job_tools.py", line 391, in run
  results = executor.map(function_wrapper, all_chunks)
File "C:\Users\SNeurobiology\miniconda3\envs\ephys-env\lib\concurrent\futures\process.py", line 766, in map
  results = super().map(partial(_process_chunk, fn),
File "C:\Users\SNeurobiology\miniconda3\envs\ephys-env\lib\concurrent\futures\_base.py", line 610, in map
  fs = [self.submit(fn, *args) for args in zip(*iterables)]
File "C:\Users\SNeurobiology\miniconda3\envs\ephys-env\lib\concurrent\futures\_base.py", line 610, in <listcomp>
  fs = [self.submit(fn, *args) for args in zip(*iterables)]
File "C:\Users\SNeurobiology\miniconda3\envs\ephys-env\lib\concurrent\futures\process.py", line 737, in submit
  self._adjust_process_count()
File "C:\Users\SNeurobiology\miniconda3\envs\ephys-env\lib\concurrent\futures\process.py", line 697, in _adjust_process_count
  self._spawn_process()
File "C:\Users\SNeurobiology\miniconda3\envs\ephys-env\lib\concurrent\futures\process.py", line 714, in _spawn_process
  p.start()
File "C:\Users\SNeurobiology\miniconda3\envs\ephys-env\lib\multiprocessing\process.py", line 121, in start
  self._popen = self._Popen(self)
File "C:\Users\SNeurobiology\miniconda3\envs\ephys-env\lib\multiprocessing\context.py", line 336, in _Popen
  return Popen(process_obj)
File "C:\Users\SNeurobiology\miniconda3\envs\ephys-env\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
  prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\SNeurobiology\miniconda3\envs\ephys-env\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
  _check_not_importing_main()
File "C:\Users\SNeurobiology\miniconda3\envs\ephys-env\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
  raise RuntimeError('''
RuntimeError:
      An attempt has been made to start a new process before the
      current process has finished its bootstrapping phase.

      This probably means that you are not using fork to start your
      child processes and you have forgotten to use the proper idiom
      in the main module:

          if __name__ == '__main__':
              freeze_support()
              ...

      The "freeze_support()" line can be omitted if the program
      is not going to be frozen to produce an executable.

The workaround was to refactor my script to be the following:

from pathlib import Path
from datetime import datetime
from spikeinterface.core import load_extractor

data_path = Path(r"...\test-data")
test_data = load_extractor(data_path)

temp_path = data_path.parent / f"test_dataset_resaved_{datetime.now().strftime('%Y%m%d-%H%M%S')}"

if __name__ == "__main__":
    test_data.save(folder=temp_path, n_jobs=-1)

And now it works. Still, I can't understand why I would need to protect my script from import - although the secrets of multiprocessing are not super clear to me and maybe I am missing something obvious.

This started with pulling the latest version of the package, when I was playing with it approx. one month ago I do not remember encountering the same issue even though I was for sure using multiprocessing already.

Thank you so very much for the amazing package!

Metadata

Metadata

Assignees

No one assigned

    Labels

    concurrencyRelated to parallel processingquestionGeneral question regarding SI

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions