Skip to content

Optimizing job_kwargs / providing default job_kwargs #2232

Open
@zm711

Description

@zm711

Github Issues

Here is a non-exhaustive list of issues that either directly were related to job_kwargs (n_jobs being the most common issue) or the potential benefit of additional guardrails in spikeinterface. I haven't directly linked any PRs for this section).

#1063
#2026
#2029
#2217
#2202
#1922
#1845

Discussion Issues

To keep this issue manageable I'm only including two topics-- how to optimize kwargs and n_jobs specifically.

Optimizing kwargs

It has come up on other occasions (the Cambridge Neurotech talk, for example), where people were unsure how to optimize the kwargs themselves. For example they know they change n_jobs to be a different number, but they don't know how to pick the appropriate number. Or how does chunk_size really affect things. Should the default help with small or big datasets or do I need to set it based on my RAM, etc. Part of this can be explained by documentation, but the fact that people are still asking means either 1) the docs are unclear or 2) that part of the docs is hard to find.

  1. Should this be explained better/made more visible in the docs (again move out of core and given its own section)
  2. Would it be beneficial to create a job_kwarg optimizer as suggested in one of the issues (A Hint : parallelization issue : BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.  #2026) so that these values dynamically change based on the OS/computer

n_jobs

The default for this is n_jobs=-1 which means all available (logical) cores. As we began to discuss in #2218, it might be nice to change this default to something that provides the OS a little breathing room when doing multiprocessing. Heberto pointed out to me that both Intel and AMD do in fact have the logical processing concept (I still need to test my Mac, but I think they do not). I'm not sure if that actually influences this or not. So if we set n_jobs=0.9 as @alejoe91 suggested it should still leave at least one logical processor to do OS tasks so I think it would safer, but maybe it is better to have a whole physical core. That I'm not sure of. Unfortunately os does not provide a way to check logical vs physical cores currently, so it would require the addition of psutil to core in order to be able to check this if the cutoff should be decided based on logical vs physical cores.

progress_bar

This is very small but the tqdm is not working on Windows similar to what was seen in #2122.

Metadata

Metadata

Assignees

No one assigned

    Labels

    coreChanges to core modulediscussionGeneral discussions and community feedback

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions