-
Notifications
You must be signed in to change notification settings - Fork 532
[WIP,ENH] Revision to the resource profiler #2193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR revises the resource profiler with the following objectives: - Increase robustness (and making sure it does not crash nipype) - Extend profiling to all interfaces (including pure python) The increase of robustness will be expected from: 1. Trying to reduce (or remove at all if possible) the logger callback to register the estimations of memory and cpus. This could be achieved by making interfaces responsible or keeping track of their resources to then collect all results after execution of the node. 2. Centralize profiler imports, like the config or logger object so that the applicability of the profiler is checked only once. This first commit just creates one new module nipype.utils.profiler, and moves the related functions in there.
Before I get further with this PR, I'd love your feedback on these issues:
|
This all seems reasonable on its face. I haven't really looked into the profiler before, so I don't have strong opinions of cases to consider. I'll try to have a more detailed look and might have more to say... |
setup.py
Outdated
@@ -148,6 +148,7 @@ def main(): | |||
entry_points=''' | |||
[console_scripts] | |||
nipypecli=nipype.scripts.cli:cli | |||
nipype_mprof=nipype.utils.profiler:main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we put this under nipypecli
? it would be nice to have a single cli.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can even remove this executable.
@oesteban - in principle this looks reasonable, here are my questions:
|
Thanks for the comments :) I am currently finishing this and will start testing all those things soon. Regarding the duration: I am still working on making the run function robuster, but in principle the storing the duration will be the same way it was even before the resource profiler was included. I'm more concerned about the cgroups, as you say this Popen will fork a new subprocess, but I guess this is new to the eyes of the scheduler (and thus will take one thread out). I'll find out how cgroups work here. |
@oesteban - it may still be listed under the parent process, but the question is how much of the resources does that thread take up. possibly not much, but just worth checking. |
@oesteban - you can use pstree to check. |
Codecov Report
@@ Coverage Diff @@
## master #2193 +/- ##
=========================================
+ Coverage 72.27% 72.3% +0.02%
=========================================
Files 1174 1175 +1
Lines 58738 58685 -53
Branches 8454 8443 -11
=========================================
- Hits 42453 42432 -21
+ Misses 14924 14885 -39
- Partials 1361 1368 +7
Continue to review full report at Codecov.
|
This is nearing a review-able status. Main changes:
I would really appreciate comments from @satra about the new approach using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable. Some comments.
doc/users/config_file.rst
Outdated
*utils_level* | ||
How detailed the logs regarding nipype utils like file operations | ||
(for example overwriting warning) or the resource profiler should be | ||
(possible values: ``INFO`` and ``DEBUG``; default value: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commas around "like file operations (...) or the resource profiler".
doc/users/config_file.rst
Outdated
@@ -146,6 +147,13 @@ Execution | |||
crashfiles allow portability across machines and shorter load time. | |||
(possible values: ``pklz`` and ``txt``; default value: ``pklz``) | |||
|
|||
*resource_monitor* | |||
Enables monitoring the resources occupation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indicate this is a boolean, note default value.
doc/users/plugins.rst
Outdated
@@ -74,6 +74,13 @@ Optional arguments:: | |||
n_procs : Number of processes to launch in parallel, if not set number of | |||
processors/threads will be automatically detected | |||
|
|||
memory_gb : Total memory available to be shared by all simultaneous tasks | |||
currently running, if not set it will be automatically estimated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "automatically set to 90% of system RAM"?
self._fmlogger.setLevel(logging.getLevelName(config.get('logging', | ||
'filemanip_level'))) | ||
self._utlogger.setLevel(logging.getLevelName(config.get('logging', | ||
'utils_level'))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it is, people who've set filemanip_level
are just going to get less logging and no explanation.
What about checking for filemanip_level
and, if set, set utils_level
and provide a deprecation notice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will work on this
nipype/interfaces/base.py
Outdated
__docformat__ = 'restructuredtext' | ||
|
||
if sys.version_info < (3, 3): | ||
setattr(sp, 'DEVNULL', os.devnull) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this? I don't see that it's used anywhere. If you do want to do this, you can't use paths. You need to pass a file descriptor, such as open(os.devnull, 'rb+')
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover from the previous implementation with a process.
|
||
free_memory_gb = self.memory_gb - busy_memory_gb | ||
free_processors = self.processors - busy_processors | ||
|
||
# Check all jobs without dependency not run | ||
jobids = np.flatnonzero((self.proc_done == False) & \ | ||
jobids = np.flatnonzero((self.proc_done == False) & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~self.proc_done
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.proc_done
is not a numpy array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wrong. Changing to ~self.proc_done
nipype/pipeline/plugins/multiproc.py
Outdated
(self.procs[jobid].overwrite == None and | ||
not self.procs[jobid]._interface.always_run))): | ||
if hash_exists and not self.procs[jobid].overwrite and \ | ||
not self.procs[jobid]._interface.always_run: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't equivalent to what it replaced. Is that intentional?
|
||
#if it is a start node, add to unifinished nodes | ||
if 'start' in node: | ||
node['start'] = parser.parse(node['start']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove the parser
import from this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -251,14 +218,15 @@ def _send_procs_to_workers(self, updatehash=False, graph=None): | |||
key=lambda item: (self.procs[item]._interface.estimated_memory_gb, | |||
self.procs[item]._interface.num_threads)) | |||
|
|||
if str2bool(config.get('execution', 'profile_runtime')): | |||
resource_monitor = str2bool(config.get('execution', 'resource_monitor', 'false')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May want to check for profile_runtime
, use value and raise deprecation warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a deprecation on the config itself
nipype/utils/profiler.py
Outdated
|
||
proflogger = logging.getLogger('utils') | ||
|
||
resource_monitor = str2bool(config.get('execution', 'resource_monitor')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or raise deprecation warning here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get now raises warning (it is not DeprecationWarning
bc they are filtered by default)
This PR revises the resource profiler with the following objectives:
The increase of robustness will be expected from:
Trying to reduce (or remove at all if possible) the logger callback
to register the estimations of memory and cpus. This could be achieved
by making interfaces responsible or keeping track of their resources
to then collect all results after execution of the node.
Centralize profiler imports, like the config or logger object
so that the applicability of the profiler is checked only once.
This first commit just creates one new module nipype.utils.profiler, and
moves the related functions in there.
Important: this PRs removes the old
filemanip
logger and replaces it by a more genericutils
. Documentation has been updated accordingly.