-
Notifications
You must be signed in to change notification settings - Fork 301
[RTM] RF: Split recon-all into coarse chunks to improve resource use estimates #506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
76fa413
to
1af1fc3
Compare
c3252f2
to
6ead409
Compare
Quick update: Re-running from scratch to verify. |
iterfield='hemi', | ||
name='autorecon_surfs') | ||
autorecon_surfs.interface.num_threads = omp_nthreads | ||
autorecon_surfs.inputs.hemi = ['lh', 'rh'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a 6-core machine, I'm getting both 5-thread jobs in this mapnode running simultaneously. @oesteban I think you're more familiar with the resource manager than I am. Do you see something wrong here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't omp_nthreahds default to nthreads-1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But we shouldn't be able to run two 5-thread jobs on a 6-thread queue... I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is correct. I wonder if this is because MultiProc is not handling MapNodes correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the possibilities are:
- MapNodes are handled incorrectly by MultiProc
- MapNodes need to have
num_threads
specified differently from other nodes - I've screwed something subtle up here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth opening an issue at nipy/nipype and pinging Cameron.
89840e8
to
3afde62
Compare
Race conditions involving Switched to an iterable to avoid the MapNode issue (nipy/nipype#2016). (May revert: nipy/nipype#2019) Can verify that one node runs if |
This looks great - let me know when it will be ready for review. |
Well, I thought this was ready to go, and was just running one last full recon... Turns out that there are more cross-hemisphere dependencies (that only show up when you run them in sequence). I think we'll need one extra node to handle these dependencies. I'm splitting the post-skullstrip recon bit into its own workflow just to keep the logic separate. This will be easier once nipy/nipype#2019 goes in, since |
f11d19e
to
1ec21e6
Compare
@chrisfilo This is ready for review. Let's wait on nipype 0.13.1 to merge. If either nipy/nipype#2019 or nipy/nipype#2020 go in, I'll be able to make a couple edits that make it more concise. Relatedly, with the nipype release(s), we may want to revisit our dependency strategy, as we discussed at one point in January/February. |
1ec21e6
to
3e95e10
Compare
Final commit is for the sake of comparing the hierarchical graph to the simple graph, to see which we'd rather have in the docs. |
LGTM have you noticed any influence on performance? |
I've really only tested it on a machine without enough cores to get workflow-level parallelism, to make sure that it doesn't depend on these per-hemisphere stages moving in lockstep. I can try to get timing info from a 16 core machine tonight. Any preference on the graph for the docs? I have one more update to make to the docs before this is fully ready. |
I prefer the simpler one (but not strongly). |
77fea25
to
3aa3690
Compare
On a 16-core machine, running identical commands, I see effectively no difference in total fmriprep runtime between 0.4.3 and this branch, both right around 6 hours total to run an anatomical + 1 resting state run (240 TRs), sampled only to T1w and fsaverage, so skipping MNI. I have start and end times for the entire pipeline instances. I forgot to save detailed logs, so my terminal history is all I have. I can verify that we did get parallel execution in both cases, but in the 0.4.3 branch, that's because we were using the |
Thanks for looking into this. If you could sync with Master so the tests will start passing I'm happy to merge. |
684e68c
to
a86766a
Compare
Woops. Rebased this with a docs-only on the end. Full tests running now. |
Move all parcellation stats and balabels to autorecon3 Explicitly build ribbon volume in autorecon3
a86766a
to
a52a040
Compare
This PR eschews the
-parallel
flag in FreeSurfer, preferring the directives that are volume or hemisphere specific. Thus, we run:-autorecon1
- Single thread (-parallel
has no effect, but-openmp
does in places)-autorecon2-volonly
- Single thread-autorecon-hemi {lh,rh}
- Replace-parallel
with nipype parallelism-all
- Run stats and any other post-hemisphere-specific processingAlso takes advantage of #494 to get rid of the separate
get_surfaces
node.