Skip to content

[RTM] RF: Split recon-all into coarse chunks to improve resource use estimates #506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
May 17, 2017

Conversation

effigies
Copy link
Member

@effigies effigies commented May 3, 2017

This PR eschews the -parallel flag in FreeSurfer, preferring the directives that are volume or hemisphere specific. Thus, we run:

  • -autorecon1 - Single thread (-parallel has no effect, but -openmp does in places)
  • Inject externally skull-stripped image
  • -autorecon2-volonly - Single thread
  • -autorecon-hemi {lh,rh} - Replace -parallel with nipype parallelism
  • -all - Run stats and any other post-hemisphere-specific processing

Also takes advantage of #494 to get rid of the separate get_surfaces node.

@effigies effigies force-pushed the ref/recon_subprocs branch 8 times, most recently from 76fa413 to 1af1fc3 Compare May 10, 2017 13:42
@effigies effigies changed the title [WIP] RF: Split recon-all into coarse chunks to improve resource use estimates [RTM] RF: Split recon-all into coarse chunks to improve resource use estimates May 10, 2017
@effigies effigies changed the title [RTM] RF: Split recon-all into coarse chunks to improve resource use estimates [WIP] RF: Split recon-all into coarse chunks to improve resource use estimates May 11, 2017
@effigies effigies force-pushed the ref/recon_subprocs branch from c3252f2 to 6ead409 Compare May 11, 2017 15:57
@effigies
Copy link
Member Author

Quick update: -autorecon-hemi pipelines have a couple statistics steps that presume both sides were already done. Nothing depends on them, so I just let them be run in the -autorecon3 step.

Re-running from scratch to verify.

iterfield='hemi',
name='autorecon_surfs')
autorecon_surfs.interface.num_threads = omp_nthreads
autorecon_surfs.inputs.hemi = ['lh', 'rh']
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a 6-core machine, I'm getting both 5-thread jobs in this mapnode running simultaneously. @oesteban I think you're more familiar with the resource manager than I am. Do you see something wrong here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't omp_nthreahds default to nthreads-1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But we shouldn't be able to run two 5-thread jobs on a 6-thread queue... I think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is correct. I wonder if this is because MultiProc is not handling MapNodes correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the possibilities are:

  1. MapNodes are handled incorrectly by MultiProc
  2. MapNodes need to have num_threads specified differently from other nodes
  3. I've screwed something subtle up here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth opening an issue at nipy/nipype and pinging Cameron.

@effigies
Copy link
Member Author

effigies commented May 14, 2017

Race conditions involving mris_anatomical_stats are now avoided by removing the -parcstats* and -balabels steps from -autorecon-hemi and letting them run in -autorecon3.

Switched to an iterable to avoid the MapNode issue (nipy/nipype#2016). (May revert: nipy/nipype#2019)

Can verify that one node runs if omp-nthreads > nthreads / 2, and both if omp-nthreads == nthreads / 2.

@chrisgorgo
Copy link
Contributor

This looks great - let me know when it will be ready for review.

@effigies
Copy link
Member Author

effigies commented May 14, 2017

Well, I thought this was ready to go, and was just running one last full recon... Turns out that there are more cross-hemisphere dependencies (that only show up when you run them in sequence). I think we'll need one extra node to handle these dependencies. I'm splitting the post-skullstrip recon bit into its own workflow just to keep the logic separate.

This will be easier once nipy/nipype#2019 goes in, since MapNodes handle syncing naturally without JoinNodes.

@effigies effigies force-pushed the ref/recon_subprocs branch 2 times, most recently from f11d19e to 1ec21e6 Compare May 15, 2017 13:58
@effigies
Copy link
Member Author

@chrisfilo This is ready for review. Let's wait on nipype 0.13.1 to merge. If either nipy/nipype#2019 or nipy/nipype#2020 go in, I'll be able to make a couple edits that make it more concise.

Relatedly, with the nipype release(s), we may want to revisit our dependency strategy, as we discussed at one point in January/February.

@effigies effigies force-pushed the ref/recon_subprocs branch from 1ec21e6 to 3e95e10 Compare May 16, 2017 13:15
@effigies
Copy link
Member Author

Final commit is for the sake of comparing the hierarchical graph to the simple graph, to see which we'd rather have in the docs.

@effigies
Copy link
Member Author

Here are our doc options:

Simple:

Colored:

@effigies effigies changed the title [WIP] RF: Split recon-all into coarse chunks to improve resource use estimates [RTM] RF: Split recon-all into coarse chunks to improve resource use estimates May 16, 2017
@effigies effigies requested a review from chrisgorgo May 16, 2017 17:58
@chrisgorgo
Copy link
Contributor

LGTM have you noticed any influence on performance?

@effigies
Copy link
Member Author

I've really only tested it on a machine without enough cores to get workflow-level parallelism, to make sure that it doesn't depend on these per-hemisphere stages moving in lockstep. I can try to get timing info from a 16 core machine tonight.

Any preference on the graph for the docs? I have one more update to make to the docs before this is fully ready.

@chrisgorgo
Copy link
Contributor

I prefer the simpler one (but not strongly).

@effigies effigies force-pushed the ref/recon_subprocs branch from 77fea25 to 3aa3690 Compare May 16, 2017 20:58
@effigies
Copy link
Member Author

effigies commented May 17, 2017

On a 16-core machine, running identical commands, I see effectively no difference in total fmriprep runtime between 0.4.3 and this branch, both right around 6 hours total to run an anatomical + 1 resting state run (240 TRs), sampled only to T1w and fsaverage, so skipping MNI.

I have start and end times for the entire pipeline instances. I forgot to save detailed logs, so my terminal history is all I have. I can verify that we did get parallel execution in both cases, but in the 0.4.3 branch, that's because we were using the -parallel flag, so we were using 2n threads even though we only claimed n.

@chrisgorgo
Copy link
Contributor

Thanks for looking into this. If you could sync with Master so the tests will start passing I'm happy to merge.

@effigies effigies force-pushed the ref/recon_subprocs branch 2 times, most recently from 684e68c to a86766a Compare May 17, 2017 16:13
@effigies
Copy link
Member Author

Woops. Rebased this with a docs-only on the end. Full tests running now.

@effigies effigies force-pushed the ref/recon_subprocs branch from a86766a to a52a040 Compare May 17, 2017 17:09
@effigies effigies merged commit 980615f into nipreps:master May 17, 2017
@effigies effigies deleted the ref/recon_subprocs branch May 17, 2017 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants