[WIP,ENH] Revision to the resource profiler #2193

oesteban · 2017-09-21T23:57:34Z

This PR revises the resource profiler with the following objectives:

Increase robustness (and making sure it does not crash nipype)
Extend profiling to all interfaces (including pure python)

The increase of robustness will be expected from:

Trying to reduce (or remove at all if possible) the logger callback
to register the estimations of memory and cpus. This could be achieved
by making interfaces responsible or keeping track of their resources
to then collect all results after execution of the node.
Centralize profiler imports, like the config or logger object
so that the applicability of the profiler is checked only once.

This first commit just creates one new module nipype.utils.profiler, and
moves the related functions in there.

Important: this PRs removes the old filemanip logger and replaces it by a more generic utils. Documentation has been updated accordingly.

This PR revises the resource profiler with the following objectives: - Increase robustness (and making sure it does not crash nipype) - Extend profiling to all interfaces (including pure python) The increase of robustness will be expected from: 1. Trying to reduce (or remove at all if possible) the logger callback to register the estimations of memory and cpus. This could be achieved by making interfaces responsible or keeping track of their resources to then collect all results after execution of the node. 2. Centralize profiler imports, like the config or logger object so that the applicability of the profiler is checked only once. This first commit just creates one new module nipype.utils.profiler, and moves the related functions in there.

oesteban · 2017-09-22T16:46:13Z

Hi @satra, @effigies,

Before I get further with this PR, I'd love your feedback on these issues:

Renaming logger filemanip -> utils and attach events from nipype.utils.filemanip, nipype.utils.provenance and nipype.utils.profiler to it.
Make interfaces responsible of monitoring themselves, and do it in a separate process so nipype is robust against errors in monitoring (see tear-up and tear-down). Additionally: all interfaces can measure their performance now (not only commandline interfaces and niu.Function).
Since the interface logs itself and they are meant to work in shared filesystems, a new .prof file is generated in the interface base directory containing the traces of memory and cpus. That will be retrieved and finally the peaks saved in the runtime object, allowing for removal of the profiler callback and log. WDYT?.
I had to simplify the run interface logic, trying to have the least lines possible within try ... except.

effigies · 2017-09-22T19:08:06Z

This all seems reasonable on its face. I haven't really looked into the profiler before, so I don't have strong opinions of cases to consider. I'll try to have a more detailed look and might have more to say...

…eProfiler

satra · 2017-09-25T11:30:46Z

setup.py

@@ -148,6 +148,7 @@ def main():
        entry_points='''
           [console_scripts]
           nipypecli=nipype.scripts.cli:cli
+           nipype_mprof=nipype.utils.profiler:main


can we put this under nipypecli? it would be nice to have a single cli.

I think we can even remove this executable.

satra · 2017-09-25T11:40:06Z

@oesteban - in principle this looks reasonable, here are my questions:

is the profiler stable across os x and linux? i see that duration is now being returned from the profiler, so that would mean that we need to have the profiler on to get this. i'm fine with that, but that would mean that the profiler should be on by default and be robust across platforms (i.e. not require sudo on some).
has this been tested with linux cgroups limiting threads? does this have an impact on performance, since this is running a separate process. as opposed to the previous case where it was a nested subprocess.

oesteban · 2017-09-25T16:11:41Z

Thanks for the comments :)

I am currently finishing this and will start testing all those things soon.

Regarding the duration: I am still working on making the run function robuster, but in principle the storing the duration will be the same way it was even before the resource profiler was included.

I'm more concerned about the cgroups, as you say this Popen will fork a new subprocess, but I guess this is new to the eyes of the scheduler (and thus will take one thread out). I'll find out how cgroups work here.

satra · 2017-09-25T16:13:35Z

@oesteban - it may still be listed under the parent process, but the question is how much of the resources does that thread take up. possibly not much, but just worth checking.

satra · 2017-09-25T16:13:49Z

@oesteban - you can use pstree to check.

codecov-io · 2017-09-25T21:31:57Z

Codecov Report

Merging #2193 into master will increase coverage by 0.02%.
The diff coverage is 91.52%.

@@            Coverage Diff            @@
##           master   #2193      +/-   ##
=========================================
+ Coverage   72.27%   72.3%   +0.02%     
=========================================
  Files        1174    1175       +1     
  Lines       58738   58685      -53     
  Branches     8454    8443      -11     
=========================================
- Hits        42453   42432      -21     
+ Misses      14924   14885      -39     
- Partials     1361    1368       +7

Flag	Coverage Δ
#smoketests	`72.3% <91.52%> (+0.02%)`	⬆️
#unittests	`69.99% <86.44%> (+0.04%)`	⬆️

Impacted Files	Coverage Δ
nipype/interfaces/afni/tests/test_auto_Notes.py	`85.71% <ø> (ø)`	⬆️
...interfaces/ants/tests/test_auto_AntsJointFusion.py	`85.71% <ø> (ø)`	⬆️
nipype/interfaces/afni/tests/test_auto_ECM.py	`85.71% <ø> (ø)`	⬆️
...ipype/interfaces/brainsuite/tests/test_auto_Tca.py	`85.71% <ø> (ø)`	⬆️
...ipype/interfaces/afni/tests/test_auto_ABoverlap.py	`85.71% <ø> (ø)`	⬆️
...faces/camino/tests/test_auto_ComputeTensorTrace.py	`85.71% <ø> (ø)`	⬆️
...faces/camino/tests/test_auto_TrackBedpostxDeter.py	`85.71% <ø> (ø)`	⬆️
...terfaces/freesurfer/tests/test_auto_MRIsCombine.py	`85.71% <ø> (ø)`	⬆️
nipype/interfaces/afni/tests/test_auto_Axialize.py	`85.71% <ø> (ø)`	⬆️
nipype/interfaces/afni/tests/test_auto_Means.py	`85.71% <ø> (ø)`	⬆️
... and 773 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f422f0...03c8d2e. Read the comment docs.

oesteban · 2017-09-26T00:16:29Z

This is nearing a review-able status. Main changes:

Switched monitoring to a thread, which is started here https://github.com/oesteban/nipype/blob/3f34711e33129415a108b0d3e93c8e9b2e8ac66e/nipype/interfaces/base.py#L1090-L1091 and stopped here: https://github.com/oesteban/nipype/blob/3f34711e33129415a108b0d3e93c8e9b2e8ac66e/nipype/interfaces/base.py#L1132. The Monitor is implemented here: https://github.com/oesteban/nipype/blob/3f34711e33129415a108b0d3e93c8e9b2e8ac66e/nipype/utils/profiler.py#L30-L54.
Revised documentation and options in depth, trying to make everything more consistent.
Now MultiProc does not block if one asks for excessive resources. A warning is issued, unless a new option plugin_args['raise_insufficient'] is True (in that case an error is raised). This has been documented.
TODO:
- Remove former monitor completely.
- Test this individually

I would really appreciate comments from @satra about the new approach using Event and a general look with fresh eyes (@effigies?) will be very useful as well.

effigies

This looks reasonable. Some comments.

effigies · 2017-09-26T00:35:47Z

doc/users/config_file.rst

+*utils_level*
+	How detailed the logs regarding nipype utils like file operations 
+	(for example overwriting warning) or the resource profiler should be 
+	(possible values: ``INFO`` and ``DEBUG``; default value:


Commas around "like file operations (...) or the resource profiler".

effigies · 2017-09-26T00:36:31Z

doc/users/config_file.rst

@@ -146,6 +147,13 @@ Execution
    crashfiles allow portability across machines and shorter load time.
    (possible values: ``pklz`` and ``txt``; default value: ``pklz``)

+*resource_monitor*
+    Enables monitoring the resources occupation.


Indicate this is a boolean, note default value.

effigies · 2017-09-26T00:37:38Z

doc/users/plugins.rst

@@ -74,6 +74,13 @@ Optional arguments::
  n_procs :  Number of processes to launch in parallel, if not set number of
  processors/threads will be automatically detected

+  memory_gb : Total memory available to be shared by all simultaneous tasks
+  currently running, if not set it will be automatically estimated.


Maybe "automatically set to 90% of system RAM"?

effigies · 2017-09-26T00:40:31Z

nipype/utils/logger.py

-        self._fmlogger.setLevel(logging.getLevelName(config.get('logging',
-                                                                'filemanip_level')))
+        self._utlogger.setLevel(logging.getLevelName(config.get('logging',
+                                                                'utils_level')))


As it is, people who've set filemanip_level are just going to get less logging and no explanation.

What about checking for filemanip_level and, if set, set utils_level and provide a deprecation notice?

Will work on this

effigies · 2017-09-26T00:49:36Z

nipype/interfaces/base.py

 __docformat__ = 'restructuredtext'

+if sys.version_info < (3, 3):
+    setattr(sp, 'DEVNULL', os.devnull)


What's the purpose of this? I don't see that it's used anywhere. If you do want to do this, you can't use paths. You need to pass a file descriptor, such as open(os.devnull, 'rb+').

Leftover from the previous implementation with a process.

effigies · 2017-09-26T01:33:55Z

nipype/pipeline/plugins/multiproc.py


        free_memory_gb = self.memory_gb - busy_memory_gb
        free_processors = self.processors - busy_processors

        # Check all jobs without dependency not run
-        jobids = np.flatnonzero((self.proc_done == False) & \
+        jobids = np.flatnonzero((self.proc_done == False) &


~self.proc_done?

self.proc_done is not a numpy array

I was wrong. Changing to ~self.proc_done

effigies · 2017-09-26T01:35:49Z

nipype/pipeline/plugins/multiproc.py

-                                             (self.procs[jobid].overwrite == None and
-                                              not self.procs[jobid]._interface.always_run))):
+                        if hash_exists and not self.procs[jobid].overwrite and \
+                           not self.procs[jobid]._interface.always_run:


This isn't equivalent to what it replaced. Is that intentional?

effigies · 2017-09-26T01:38:37Z

nipype/utils/draw_gantt_chart.py

-
-            #if it is a start node, add to unifinished nodes
-            if 'start' in node:
-                node['start'] = parser.parse(node['start'])


You can remove the parser import from this file.

effigies · 2017-09-26T01:41:12Z

nipype/pipeline/plugins/multiproc.py

@@ -251,14 +218,15 @@ def _send_procs_to_workers(self, updatehash=False, graph=None):
                        key=lambda item: (self.procs[item]._interface.estimated_memory_gb,
                                          self.procs[item]._interface.num_threads))

-        if str2bool(config.get('execution', 'profile_runtime')):
+        resource_monitor = str2bool(config.get('execution', 'resource_monitor', 'false'))


May want to check for profile_runtime, use value and raise deprecation warning.

Added a deprecation on the config itself

effigies · 2017-09-26T01:41:43Z

nipype/utils/profiler.py

+
+proflogger = logging.getLogger('utils')
+
+resource_monitor = str2bool(config.get('execution', 'resource_monitor'))


Or raise deprecation warning here...

get now raises warning (it is not DeprecationWarning bc they are filtered by default)

…eProfiler

…n versions

oesteban added 5 commits September 21, 2017 16:47

fix tests

32c2f39

Python 2 compatibility

0e2c581

add nipype_mprof

5a8e7fe

implement monitor in a parallel process

7d953cc

oesteban added 5 commits September 22, 2017 13:16

set profiling outputs to runtime object, read it from node execution

306c4ec

revise profiler callback

8a903f0

Merge remote-tracking branch 'upstream/master' into enh/ReviseResourc…

02fdbda

…eProfiler

robuster constructor

e3982d7

remove unused import

48f87af

satra reviewed Sep 25, 2017

View reviewed changes

various fixes

46dde32

oesteban added 5 commits September 25, 2017 15:24

cleaning up code

9d70a2f

remove comment

1fabd25

interface.base cleanup

ecedfcf

update new config settings

2d35959

make naming consistent across tests

3f34711

implement raise_insufficient

99ded42

effigies reviewed Sep 26, 2017

View reviewed changes

oesteban added 3 commits September 25, 2017 22:42

fix test

b0d25bd

fix test (amend previous commit)

2a37693

address review comments

10d0f39

oesteban added 23 commits September 26, 2017 00:50

fix typo

62a6593

fixes to the tear-up section of interfaces

d6401f3

fix NoSuchProcess exception

ce3f08a

making monitor robuster

ffb7509

Merge remote-tracking branch 'upstream/master' into enh/ReviseResourc…

7b7846b

…eProfiler

first functional prototype

c9b474b

Merge remote-tracking branch 'upstream/master' into enh/ReviseResourc…

117924c

…eProfiler

add warning to old filemanip logger

cf1f15b

do not search for filemanip_level in config

4b7ab93

fix CommandLine interface doctest

c7a1992

update specs

8d02397

fix tests

c789b17

fix location of use_resources

a9824f1

fix attribute error when input spec is not standard

30d79e9

re-include filemanip logger into config documentation

49d4843

minor additions to resource_monitor option

ff94a4b

fix resource_monitor tests

55acde0

run build 2 (the shortest) with the resource monitor on

7cd02ee

fix unbound variable

a42ef60

collect resource_monitor info after run

10865f1

reduce resource_monitor_frequency on tests (and we test it works)

e0e341b

store a new trace before exit

06c9f20

run resource_monitor only for level2 of fmri_spm_nested, switch pytho…

03c8d2e

…n versions

oesteban closed this Sep 27, 2017

oesteban deleted the enh/ReviseResourceProfiler branch September 27, 2017 20:55

oesteban mentioned this pull request Sep 27, 2017

[ENH] New ResourceMonitor (replaces resource profiler) #2200

Merged

5 tasks


		proflogger = logging.getLogger('utils')

		resource_monitor = str2bool(config.get('execution', 'resource_monitor'))

[WIP,ENH] Revision to the resource profiler #2193

[WIP,ENH] Revision to the resource profiler #2193

Uh oh!

Conversation

oesteban commented Sep 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oesteban commented Sep 22, 2017

Uh oh!

effigies commented Sep 22, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

satra commented Sep 25, 2017

Uh oh!

oesteban commented Sep 25, 2017

Uh oh!

satra commented Sep 25, 2017

Uh oh!

satra commented Sep 25, 2017

Uh oh!

codecov-io commented Sep 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

oesteban commented Sep 26, 2017

Uh oh!

effigies left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

oesteban commented Sep 21, 2017 •

edited

Loading

codecov-io commented Sep 25, 2017 •

edited

Loading