Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The prediction takes 17+hours on MacBook with M1 chip #379

Closed
gmeng92 opened this issue Oct 9, 2023 · 6 comments
Closed

The prediction takes 17+hours on MacBook with M1 chip #379

gmeng92 opened this issue Oct 9, 2023 · 6 comments
Labels
question Further information is requested

Comments

@gmeng92
Copy link

gmeng92 commented Oct 9, 2023

Question/Support Request

I am using your pretrained model to do segmentation only using one T1 image. The voxel size is (256,256,256).
The model was ran using the docker container on my Macbook. According to the record of the log file, it seems the pretrained model performed a three-way inference (coronal, sagittal, axial) and the accumulated runnning time is >94000 seconds. Can you help me figure out why it takes so long to run the pretrained model on one image? I thought the high resolution might be one cause but it shouldn't take that long. Thanks!

Screenshots

...

Here is the log file of the experiment, the subjID was omitted due to PHI protection.

Log file for segmentation FastSurferCNN/run_prediction.py
Wed Sep 27 14:08:25 UTC 2023
python3.8 /fastsurfer/FastSurferCNN/run_prediction.py --t1 /data/sub-/resampled_t1.nii.gz --asegdkt_segfile /output/sub-/mri/aparc.DKTatlas+aseg.deep.mgz --conformed_name /output/sub-/mri/orig.mgz --brainmask_name /output/sub-/mri/mask.mgz --aseg_name /output/sub-/mri/aseg.auto_noCCseg.mgz --sid sub---seg_log /output/sub-/scripts/deep-seg.log --vox_size min --batch_size 1 --viewagg_device auto --device cpu
[INFO: run_prediction.py: 392]: Checking or downloading default checkpoints ...
[INFO: common.py: 102]: Using device: cpu
[INFO: run_prediction.py: 133]: Running view aggregation on cpu
[INFO: inference.py: 110]: Loading checkpoint /fastsurfer/checkpoints/aparc_vinn_coronal_v2.0.0.pkl
[INFO: inference.py: 110]: Loading checkpoint /fastsurfer/checkpoints/aparc_vinn_sagittal_v2.0.0.pkl
[INFO: inference.py: 110]: Loading checkpoint /fastsurfer/checkpoints/aparc_vinn_axial_v2.0.0.pkl
[INFO: common.py: 479]: Single subject with absolute file path for input.
[INFO: common.py: 492]: No subjects directory specified, but the parent directory of the output file /output/sub-
/mri/aparc.DKTatlas+aseg.deep.mgz is 'mri', so we are assuming this is the 'mri' folder in the subject directory.
[INFO: common.py: 534]: Analyzing single subject /data/sub-/resampled_t1.nii.gz
[INFO: common.py: 612]: Output will be stored in Subjects Directory: /output
[INFO: run_prediction.py: 193]: Successfully loaded image from /data/sub-
/resampled_t1.nii.gz.
[INFO: run_prediction.py: 267]: Output image directory 001.mgz does not exist. Creating it now...
[INFO: run_prediction.py: 279]: Successfully saved image as /output/sub-/mri/orig/001.mgz.
[INFO: run_prediction.py: 206]: Conforming image
[INFO: run_prediction.py: 279]: Successfully saved image as /output/sub-
/mri/orig.mgz.
[INFO: run_prediction.py: 243]: Run coronal prediction
[INFO: dataset.py: 56]: Loading Coronal with input voxelsize (1.0, 1.0)
[INFO: inference.py: 234]: Inference on 256 batches for coronal successful
[INFO: inference.py: 272]: Coronal inference on /data/sub-/resampled_t1.nii.gz finished in 36577.5203 seconds
[INFO: run_prediction.py: 243]: Run sagittal prediction
[INFO: dataset.py: 47]: Loading Sagittal with input voxelsize (1.0, 1.0)
[INFO: inference.py: 234]: Inference on 256 batches for sagittal successful
[INFO: inference.py: 272]: Sagittal inference on /data/sub-
/resampled_t1.nii.gz finished in 28889.3878 seconds
[INFO: run_prediction.py: 243]: Run axial prediction
[INFO: dataset.py: 52]: Loading Axial with input voxelsize (1.0, 1.0)
[INFO: inference.py: 234]: Inference on 256 batches for axial successful
[INFO: inference.py: 272]: Axial inference on /data/sub-/resampled_t1.nii.gz finished in 30984.1990 seconds
[INFO: run_prediction.py: 279]: Successfully saved image as /output/sub-
/mri/aparc.DKTatlas+aseg.deep.mgz.
[INFO: run_prediction.py: 430]: Creating brainmask based on segmentation...
[INFO: run_prediction.py: 279]: Successfully saved image as /output/sub-/mri/mask.mgz.
[INFO: run_prediction.py: 447]: Creating aseg based on segmentation...
[INFO: run_prediction.py: 279]: Successfully saved image as /output/sub-
/mri/aseg.auto_noCCseg.mgz.
[INFO: run_prediction.py: 464]: Running volume-based QC check on segmentation...
python3.8 /fastsurfer/recon_surf/N4_bias_correct.py --in /output/sub-/mri/orig.mgz --out /output/sub-/mri/orig_nu.mgz --mask /output/sub-/mri/mask.mgz --threads 1
python3.8 /fastsurfer/FastSurferCNN/segstats.py --segfile /output/sub-
/mri/aparc.DKTatlas+aseg.deep.mgz --segstatsfile /output/sub-/stats/aseg+DKT.stats --normfile /output/sub-/mri/orig_nu.mgz --empty --excludeid 0 --ids 2 4 5 7 8 10 11 12 13 14 15 16 17 18 24 26 28 31 41 43 44 46 47 49 50 51 52 53 54 58 60 63 77 251 252 253 254 255 1002 1003 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1034 1035 2002 2003 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2034 2035 --lut /fastsurfer/FastSurferCNN/config/FreeSurferColorLUT.txt --threads 1
Partial volume stats for 100 labels written to /output/sub-/stats/aseg+DKT.stats.
Calculation took 85.32 seconds using up to 1 threads.
python3.8 /fastsurfer/CerebNet/run_prediction.py --t1 /data/sub-
/resampled_t1.nii.gz --asegdkt_segfile /output/sub-/mri/aparc.DKTatlas+aseg.deep.mgz --conformed_name /output/sub-/mri/orig.mgz --cereb_segfile /output/sub-/mri/cerebellum.CerebNet.nii.gz --seg_log /output/sub-/scripts/deep-seg.log --batch_size 1 --viewagg_device auto --device cpu --async_io --threads 1 --norm_name /output/sub-/mri/orig_nu.mgz --cereb_statsfile /output/sub-/stats/cerebellum.CerebNet.stats
[INFO: run_prediction.py: 123]: Checking or downloading default checkpoints ...
[INFO: common.py: 479]: Single subject with absolute file path for input.
[INFO: common.py: 492]: No subjects directory specified, but the parent directory of the output file /output/sub-/mri/cerebellum.CerebNet.nii.gz is 'mri', so we are assuming this is the 'mri' folder in the subject directory.
[INFO: common.py: 534]: Analyzing single subject /data/sub-
/resampled_t1.nii.gz
[INFO: common.py: 102]: Using device: cpu
[INFO: common.py: 102]: Using viewagg_device: cpu
[INFO: inference.py: 400]: 23-09-28_17:06:39
[INFO: inference.py: 329]: Saving CerebNet cerebellum segmentation at /output/sub-/mri/cerebellum.CerebNet.nii.gz
[INFO: inference.py: 474]: Subject 1/1 with id 'sub-
' processed in 77460.94 sec.
./recon-surf.sh --sid sub---sd /output --t1 /output/sub-/mri/orig.mgz --asegdkt_segfile /output/sub-***/mri/aparc.DKTatlas+aseg.deep.mgz --parallel --threads 1 --py python3.8

Environment

  • FastSurfer Version:
  • FreeSurfer Version: ...
  • OS: MACOS Ventura 13.5.2
  • CPU: MAC M1
  • GPU: NA

Execution

docker run -v /Users//Projects/**:/data
-v /Users/
/Projects//fastsurfer_analysis:/output
--rm --user $(id -u):$(id -g) deepmi/fastsurfer:latest
--t1 /data/sub-
/resampled_t1.nii.gz
--sid sub-
** --sd /output
--parallel
--device cpu

Run Command:

@gmeng92 gmeng92 added the question Further information is requested label Oct 9, 2023
@LeHenschel
Copy link
Member

We do not officially support MacOS. However, we do have some recommendations in https://github.com/Deep-MI/FastSurfer/blob/dev/INSTALL.md#macos.

@gmeng92
Copy link
Author

gmeng92 commented Oct 9, 2023

We do not officially support MacOS. However, we do have some recommendations in https://github.com/Deep-MI/FastSurfer/blob/dev/INSTALL.md#macos.

Thanks! I will try the bash install and check the speed.

@dkuegler
Copy link
Member

Please also note that as far as we know docker does not forward the m1 chips AI capabilities. So you end up using just the CPU.
17 hours still sounds like was too much, but the real speedup you get with a native Installation.

@dkuegler
Copy link
Member

dkuegler commented Oct 10, 2023

For documentation sake, the current default behavior of Fastsurfer is to only run on 1 thread if running on CPU (that is to run one image per core for multiprocessing). Since the M1 cores are individually not Super-fast, this will be super slow.

Solution:
Add a --threads 8 this is currently not the default, because there have been reports of inconsistent performance of the feeesurfer-based surface pipeline in multithreading configurations #371

@gmeng92
Copy link
Author

gmeng92 commented Oct 10, 2023

Thanks for the additional comments and tips. I have used the bash install and run with the Mac accelerattionexport PYTORCH_ENABLE_MPS_FALLBACK=1. It takes 187 mins running on 200 subjects in the same setting for the segmentation-only task. While python3 -m pip install -r requirements.mac.txt will raise errors for some of the packages, manually reinstalling the packages ( like scikit-image, h5py) solves the problem at my end. And here is an example of my bash code if someone found it helpful.

input_directory="dir/to/your/mri/data" for subfolder in "$input_directory"sub-*/; do subfolder_name=$(basename "$subfolder") python3 run_prediction.py --t1 "$subfolder"/t1.nii.gz \ --sd output/dir \ --sid "$subfolder_name" \ --seg_log dir/of/the/log/file"$subfolder_name"/temp_exp.log echo "Processed subfolder: $subfolder_name" done

And I will close this issues here. Thanks!

@gmeng92 gmeng92 closed this as completed Oct 10, 2023
@m-reuter
Copy link
Member

Just my 2 cents: I think the very slow speeds were probably due to the emulation of the Intel docker images for the m-chip. But when I built an ARM image it was still pretty slow, so native install is really the best at the moment for macs with M.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants