Uncontrolled Multi CPU Threading in FastSurfer (even when setting value for --threads) #371

LeHenschel · 2023-09-05T10:30:34Z

Description

Usage of used cpu-threads is not controllable via the --threads environment for FastSurfer segmentation modules. In the FastSurfer surface pipeline, controllability is only given when threads is set to 1.

Overall, also setting the environment variable OMP_NUM_THREADS in run_fastsurfer.sh instead of recon-surf.sh may solve the issue for --threads 1. Other assignments (threads > 1) are, however, not guaranteed to keep the cpu usage to the determined thread number (neither in the segmentation nor the surface module). The issue here is numpys multi-processing:

In it's default state, numpy will use all available threads for all functions compiled against multi-processing compatible C libraries (OpenBLAS, MKL,...). This can cause issues in two ways a.) cpu overload when running in parallel, b.) slowdown of functions for small matrices/operations (unnecessary overhead basically). There is no option to change this in numpy per se (mainly because a catch-all solution for all the different C libraries is difficult: see e.g. numpy/numpy#16990, numpy/numpy#11826).

Short term solution

Set all possible relevant environment variables to a specific value before (!) numpy is imported. This is a simple solution with the drawback that all relevant variables (https://stackoverflow.com/questions/30791550/limit-number-of-threads-in-numpy) have to be known and changed (and the list might change).

Permanent fix

The current recommendation (per this discussion on the numpy github: numpy/numpy#11826) is to use the threadpoolctl package to wrap all relevant functions. This way, user-specified thread variables can actually be used, rather than limiting everything to 1. This would require several changes in Lapy and FastSurfer.

dkuegler · 2023-09-24T11:34:56Z

I think multi-cpu management is still an open issue.
I thought limiting the cpu availability via singularity (or docker) might actually be the best option, as documented in https://docs.sylabs.io/guides/main/user-guide/cgroups.html
But there it also adds another way to limit the cpu usage -- through systemd-run, which should be available in ubuntu 22.04 by default (https://docs.sylabs.io/guides/main/user-guide/cgroups.html#applying-resource-limits-with-external-tools, https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html)

I am not really sure whether or not this solves the issue, but it is worth having a look at...

dkuegler · 2023-09-24T11:37:47Z

What should also be mentioned here, right now, the default value for --threads in run_fastsurfer.sh is 1, which means that both inference and segstats.py get significantly slower. I am not too sure about N4, it might be N4 is currently also "circumventing" the thread limitation.
Generally, this means that you need to manually specify a reasonable value for --threads to get close to the 1minute for segmentation target.

LeHenschel added needs-fix A reproducable bug that needs to be fixed enhancement New feature or request labels Sep 5, 2023

dkuegler mentioned this issue Oct 10, 2023

The prediction takes 17+hours on MacBook with M1 chip #379

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uncontrolled Multi CPU Threading in FastSurfer (even when setting value for --threads) #371

Uncontrolled Multi CPU Threading in FastSurfer (even when setting value for --threads) #371

LeHenschel commented Sep 5, 2023

dkuegler commented Sep 24, 2023

dkuegler commented Sep 24, 2023

Uncontrolled Multi CPU Threading in FastSurfer (even when setting value for --threads) #371

Uncontrolled Multi CPU Threading in FastSurfer (even when setting value for --threads) #371

Comments

LeHenschel commented Sep 5, 2023

Description

Short term solution

Permanent fix

dkuegler commented Sep 24, 2023

dkuegler commented Sep 24, 2023