You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Usage of used cpu-threads is not controllable via the --threads environment for FastSurfer segmentation modules. In the FastSurfer surface pipeline, controllability is only given when threads is set to 1.
Overall, also setting the environment variable OMP_NUM_THREADS in run_fastsurfer.sh instead of recon-surf.sh may solve the issue for --threads 1. Other assignments (threads > 1) are, however, not guaranteed to keep the cpu usage to the determined thread number (neither in the segmentation nor the surface module). The issue here is numpys multi-processing:
In it's default state, numpy will use all available threads for all functions compiled against multi-processing compatible C libraries (OpenBLAS, MKL,...). This can cause issues in two ways a.) cpu overload when running in parallel, b.) slowdown of functions for small matrices/operations (unnecessary overhead basically). There is no option to change this in numpy per se (mainly because a catch-all solution for all the different C libraries is difficult: see e.g. numpy/numpy#16990, numpy/numpy#11826).
The current recommendation (per this discussion on the numpy github: numpy/numpy#11826) is to use the threadpoolctl package to wrap all relevant functions. This way, user-specified thread variables can actually be used, rather than limiting everything to 1. This would require several changes in Lapy and FastSurfer.
The text was updated successfully, but these errors were encountered:
What should also be mentioned here, right now, the default value for --threads in run_fastsurfer.sh is 1, which means that both inference and segstats.py get significantly slower. I am not too sure about N4, it might be N4 is currently also "circumventing" the thread limitation.
Generally, this means that you need to manually specify a reasonable value for --threads to get close to the 1minute for segmentation target.
Description
Usage of used cpu-threads is not controllable via the --threads environment for FastSurfer segmentation modules. In the FastSurfer surface pipeline, controllability is only given when threads is set to 1.
Overall, also setting the environment variable OMP_NUM_THREADS in run_fastsurfer.sh instead of recon-surf.sh may solve the issue for --threads 1. Other assignments (threads > 1) are, however, not guaranteed to keep the cpu usage to the determined thread number (neither in the segmentation nor the surface module). The issue here is numpys multi-processing:
In it's default state, numpy will use all available threads for all functions compiled against multi-processing compatible C libraries (OpenBLAS, MKL,...). This can cause issues in two ways a.) cpu overload when running in parallel, b.) slowdown of functions for small matrices/operations (unnecessary overhead basically). There is no option to change this in numpy per se (mainly because a catch-all solution for all the different C libraries is difficult: see e.g. numpy/numpy#16990, numpy/numpy#11826).
Short term solution
Set all possible relevant environment variables to a specific value before (!) numpy is imported. This is a simple solution with the drawback that all relevant variables (https://stackoverflow.com/questions/30791550/limit-number-of-threads-in-numpy) have to be known and changed (and the list might change).
Permanent fix
The current recommendation (per this discussion on the numpy github: numpy/numpy#11826) is to use the threadpoolctl package to wrap all relevant functions. This way, user-specified thread variables can actually be used, rather than limiting everything to 1. This would require several changes in Lapy and FastSurfer.
The text was updated successfully, but these errors were encountered: