Skip to content

Commit

Permalink
Use joblib for parallelism in regress_out (#1695)
Browse files Browse the repository at this point in the history
* Use joblib for parallism in regress_out

* release note

* fix link in release notes

* Add todo for resource test
  • Loading branch information
ivirshup authored Mar 3, 2021
1 parent ae3f8b0 commit d69aa18
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 8 deletions.
1 change: 1 addition & 0 deletions docs/release-notes/1.7.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

- Fix :func:`scanpy.pl.paga_path` `TypeError` with recent versions of anndata :pr:`1047` :smaller:`P Angerer`
- :func:`scanpy.logging.print_versions` now works when `python<3.8` :pr:`1691` :smaller:`I Virshup`
- :func:`scanpy.pp.regress_out` now uses `joblib` as the parallel backend, and should stop oversubscribing threads :pr:`1694` :smaller:`I Virshup`

.. rubric:: Deprecations

Expand Down
11 changes: 3 additions & 8 deletions scanpy/preprocessing/_simple.py
Original file line number Diff line number Diff line change
Expand Up @@ -662,15 +662,10 @@ def regress_out(
regres = regressors
tasks.append(tuple((data_chunk, regres, variable_is_categorical)))

if n_jobs > 1 and n_chunks > 1:
import multiprocessing
from joblib import Parallel, delayed

pool = multiprocessing.Pool(n_jobs)
res = pool.map_async(_regress_out_chunk, tasks).get(9999999)
pool.close()

else:
res = list(map(_regress_out_chunk, tasks))
# TODO: figure out how to test that this doesn't oversubscribe resources
res = Parallel(n_jobs=n_jobs)(delayed(_regress_out_chunk)(task) for task in tasks)

# res is a list of vectors (each corresponding to a regressed gene column).
# The transpose is needed to get the matrix in the shape needed
Expand Down

0 comments on commit d69aa18

Please sign in to comment.