[RTM] RF: Split recon-all into coarse chunks to improve resource use estimates #506

effigies · 2017-05-03T22:40:38Z

This PR eschews the -parallel flag in FreeSurfer, preferring the directives that are volume or hemisphere specific. Thus, we run:

-autorecon1 - Single thread (-parallel has no effect, but -openmp does in places)
Inject externally skull-stripped image
-autorecon2-volonly - Single thread
-autorecon-hemi {lh,rh} - Replace -parallel with nipype parallelism
-all - Run stats and any other post-hemisphere-specific processing

Also takes advantage of #494 to get rid of the separate get_surfaces node.

effigies · 2017-05-11T16:32:59Z

Quick update: -autorecon-hemi pipelines have a couple statistics steps that presume both sides were already done. Nothing depends on them, so I just let them be run in the -autorecon3 step.

Re-running from scratch to verify.

effigies · 2017-05-11T21:29:55Z

fmriprep/workflows/anatomical.py

+        iterfield='hemi',
+        name='autorecon_surfs')
+    autorecon_surfs.interface.num_threads = omp_nthreads
+    autorecon_surfs.inputs.hemi = ['lh', 'rh']


On a 6-core machine, I'm getting both 5-thread jobs in this mapnode running simultaneously. @oesteban I think you're more familiar with the resource manager than I am. Do you see something wrong here?

doesn't omp_nthreahds default to nthreads-1?

Yes. But we shouldn't be able to run two 5-thread jobs on a 6-thread queue... I think?

Yes that is correct. I wonder if this is because MultiProc is not handling MapNodes correctly.

Yeah, I think the possibilities are:

MapNodes are handled incorrectly by MultiProc

MapNodes need to have num_threads specified differently from other nodes

I've screwed something subtle up here

Might be worth opening an issue at nipy/nipype and pinging Cameron.

effigies · 2017-05-14T03:40:47Z

Race conditions involving mris_anatomical_stats are now avoided by removing the -parcstats* and -balabels steps from -autorecon-hemi and letting them run in -autorecon3.

Switched to an iterable to avoid the MapNode issue (nipy/nipype#2016). (May revert: nipy/nipype#2019)

Can verify that one node runs if omp-nthreads > nthreads / 2, and both if omp-nthreads == nthreads / 2.

chrisgorgo · 2017-05-14T06:57:41Z

This looks great - let me know when it will be ready for review.

effigies · 2017-05-14T13:50:16Z

Well, I thought this was ready to go, and was just running one last full recon... Turns out that there are more cross-hemisphere dependencies (that only show up when you run them in sequence). I think we'll need one extra node to handle these dependencies. I'm splitting the post-skullstrip recon bit into its own workflow just to keep the logic separate.

This will be easier once nipy/nipype#2019 goes in, since MapNodes handle syncing naturally without JoinNodes.

effigies · 2017-05-15T14:11:14Z

@chrisfilo This is ready for review. Let's wait on nipype 0.13.1 to merge. If either nipy/nipype#2019 or nipy/nipype#2020 go in, I'll be able to make a couple edits that make it more concise.

Relatedly, with the nipype release(s), we may want to revisit our dependency strategy, as we discussed at one point in January/February.

effigies · 2017-05-16T13:18:31Z

Final commit is for the sake of comparing the hierarchical graph to the simple graph, to see which we'd rather have in the docs.

effigies · 2017-05-16T17:57:57Z

Here are our doc options:

Simple:

Colored:

chrisgorgo · 2017-05-16T18:15:43Z

LGTM have you noticed any influence on performance?

effigies · 2017-05-16T18:23:44Z

I've really only tested it on a machine without enough cores to get workflow-level parallelism, to make sure that it doesn't depend on these per-hemisphere stages moving in lockstep. I can try to get timing info from a 16 core machine tonight.

Any preference on the graph for the docs? I have one more update to make to the docs before this is fully ready.

chrisgorgo · 2017-05-16T18:25:16Z

I prefer the simpler one (but not strongly).

effigies · 2017-05-17T11:59:42Z

On a 16-core machine, running identical commands, I see effectively no difference in total fmriprep runtime between 0.4.3 and this branch, both right around 6 hours total to run an anatomical + 1 resting state run (240 TRs), sampled only to T1w and fsaverage, so skipping MNI.

I have start and end times for the entire pipeline instances. I forgot to save detailed logs, so my terminal history is all I have. I can verify that we did get parallel execution in both cases, but in the 0.4.3 branch, that's because we were using the -parallel flag, so we were using 2n threads even though we only claimed n.

chrisgorgo · 2017-05-17T15:04:17Z

Thanks for looking into this. If you could sync with Master so the tests will start passing I'm happy to merge.

effigies · 2017-05-17T16:13:24Z

Woops. Rebased this with a docs-only on the end. Full tests running now.

Move all parcellation stats and balabels to autorecon3 Explicitly build ribbon volume in autorecon3

effigies force-pushed the ref/recon_subprocs branch 8 times, most recently from 76fa413 to 1af1fc3 Compare May 10, 2017 13:42

effigies changed the title ~~[WIP] RF: Split recon-all into coarse chunks to improve resource use estimates~~ [RTM] RF: Split recon-all into coarse chunks to improve resource use estimates May 10, 2017

effigies changed the title ~~[RTM] RF: Split recon-all into coarse chunks to improve resource use estimates~~ [WIP] RF: Split recon-all into coarse chunks to improve resource use estimates May 11, 2017

effigies force-pushed the ref/recon_subprocs branch from c3252f2 to 6ead409 Compare May 11, 2017 15:57

effigies commented May 11, 2017

View reviewed changes

effigies mentioned this pull request May 12, 2017

Processes in MapNodes do not respect thread limits nipy/nipype#2016

Closed

effigies force-pushed the ref/recon_subprocs branch 2 times, most recently from 89840e8 to 3afde62 Compare May 14, 2017 00:40

effigies force-pushed the ref/recon_subprocs branch 2 times, most recently from f11d19e to 1ec21e6 Compare May 15, 2017 13:58

effigies force-pushed the ref/recon_subprocs branch from 1ec21e6 to 3e95e10 Compare May 16, 2017 13:15

effigies changed the title ~~[WIP] RF: Split recon-all into coarse chunks to improve resource use estimates~~ [RTM] RF: Split recon-all into coarse chunks to improve resource use estimates May 16, 2017

effigies requested a review from chrisgorgo May 16, 2017 17:58

chrisgorgo approved these changes May 16, 2017

View reviewed changes

effigies force-pushed the ref/recon_subprocs branch from 77fea25 to 3aa3690 Compare May 16, 2017 20:58

effigies force-pushed the ref/recon_subprocs branch 2 times, most recently from 684e68c to a86766a Compare May 17, 2017 16:13

effigies added 13 commits May 17, 2017 13:04

RF: Split recon-all directives to adjust parallelism

2a4a013

De-duplicate surface outputs

b1c9588

Remove get_surfaces iterable node

cd9a0e6

Use autorecon3 instead of recon-all -all to finish

47e15ff

Update recon node names

c9b0c8a

DOC: Update changelog

4172e59

Adjust recon flags

cc7e105

Move all parcellation stats and balabels to autorecon3 Explicitly build ribbon volume in autorecon3

RF: Use iterables instead of MapNode to manage parallelism

1833ab3

RF: Separate autorecon_resume and gifti_surface workflows

7c8772e

Add sync point, postpone hyporelabel

2b60563

RF: Switch back to MapNodes for conciseness

ba3367e

DOC: Describe subprocess breakdown

01de20a

DOC: Update doc pins

a52a040

effigies force-pushed the ref/recon_subprocs branch from a86766a to a52a040 Compare May 17, 2017 17:09

effigies merged commit 980615f into nipreps:master May 17, 2017

effigies deleted the ref/recon_subprocs branch May 17, 2017 19:24

[RTM] RF: Split recon-all into coarse chunks to improve resource use estimates #506

[RTM] RF: Split recon-all into coarse chunks to improve resource use estimates #506

Uh oh!

Conversation

effigies commented May 3, 2017

Uh oh!

effigies commented May 11, 2017

Uh oh!

effigies May 11, 2017

Choose a reason for hiding this comment

Uh oh!

chrisgorgo May 11, 2017

Choose a reason for hiding this comment

Uh oh!

effigies May 11, 2017

Choose a reason for hiding this comment

Uh oh!

chrisgorgo May 11, 2017

Choose a reason for hiding this comment

Uh oh!

effigies May 11, 2017

Choose a reason for hiding this comment

Uh oh!

chrisgorgo May 11, 2017

Choose a reason for hiding this comment

Uh oh!

effigies commented May 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisgorgo commented May 14, 2017

Uh oh!

effigies commented May 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

effigies commented May 15, 2017

Uh oh!

effigies commented May 16, 2017

Uh oh!

effigies commented May 16, 2017

Uh oh!

chrisgorgo commented May 16, 2017

Uh oh!

effigies commented May 16, 2017

Uh oh!

chrisgorgo commented May 16, 2017

Uh oh!

effigies commented May 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisgorgo commented May 17, 2017

Uh oh!

effigies commented May 17, 2017

Uh oh!

Uh oh!

effigies commented May 14, 2017 •

edited

Loading

effigies commented May 14, 2017 •

edited

Loading

effigies commented May 17, 2017 •

edited

Loading