Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uncontrolled Multi CPU Threading in FastSurfer (even when setting value for --threads) #371

Open
LeHenschel opened this issue Sep 5, 2023 · 2 comments
Labels
enhancement New feature or request needs-fix A reproducable bug that needs to be fixed

Comments

@LeHenschel
Copy link
Member

Description

Usage of used cpu-threads is not controllable via the --threads environment for FastSurfer segmentation modules. In the FastSurfer surface pipeline, controllability is only given when threads is set to 1.

Overall, also setting the environment variable OMP_NUM_THREADS in run_fastsurfer.sh instead of recon-surf.sh may solve the issue for --threads 1. Other assignments (threads > 1) are, however, not guaranteed to keep the cpu usage to the determined thread number (neither in the segmentation nor the surface module). The issue here is numpys multi-processing:

In it's default state, numpy will use all available threads for all functions compiled against multi-processing compatible C libraries (OpenBLAS, MKL,...). This can cause issues in two ways a.) cpu overload when running in parallel, b.) slowdown of functions for small matrices/operations (unnecessary overhead basically). There is no option to change this in numpy per se (mainly because a catch-all solution for all the different C libraries is difficult: see e.g. numpy/numpy#16990, numpy/numpy#11826).

Short term solution

Set all possible relevant environment variables to a specific value before (!) numpy is imported. This is a simple solution with the drawback that all relevant variables (https://stackoverflow.com/questions/30791550/limit-number-of-threads-in-numpy) have to be known and changed (and the list might change).

Permanent fix

The current recommendation (per this discussion on the numpy github: numpy/numpy#11826) is to use the threadpoolctl package to wrap all relevant functions. This way, user-specified thread variables can actually be used, rather than limiting everything to 1. This would require several changes in Lapy and FastSurfer.

@LeHenschel LeHenschel added needs-fix A reproducable bug that needs to be fixed enhancement New feature or request labels Sep 5, 2023
@dkuegler
Copy link
Member

I think multi-cpu management is still an open issue.
I thought limiting the cpu availability via singularity (or docker) might actually be the best option, as documented in https://docs.sylabs.io/guides/main/user-guide/cgroups.html
But there it also adds another way to limit the cpu usage -- through systemd-run, which should be available in ubuntu 22.04 by default (https://docs.sylabs.io/guides/main/user-guide/cgroups.html#applying-resource-limits-with-external-tools, https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html)

I am not really sure whether or not this solves the issue, but it is worth having a look at...

@dkuegler
Copy link
Member

What should also be mentioned here, right now, the default value for --threads in run_fastsurfer.sh is 1, which means that both inference and segstats.py get significantly slower. I am not too sure about N4, it might be N4 is currently also "circumventing" the thread limitation.
Generally, this means that you need to manually specify a reasonable value for --threads to get close to the 1minute for segmentation target.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs-fix A reproducable bug that needs to be fixed
Projects
None yet
Development

No branches or pull requests

2 participants