ENH: Add support for reduced SGEgraph on re-run #1002

hjmjohnson · 2014-11-21T23:27:04Z

During the re-running of a workflow with SGEgraph, we can greatly
reduce the dependancies by determining which nodes have completed
prior to generating the submit-graph.

Job numbers need to be based on job ID's not on job names. Job names are not unique.

BUG: Job names must be valid bash variables.

During the re-running of a workflow with SGEgraph, we can greatly reduce the dependancies by determining which nodes have completed prior to generating the submit-graph. Job numbers need to be based on job ID's not on job names. Job names are not unique. BUG: Job names must be valid bash variables.

satra · 2014-11-22T13:36:29Z

@hjmjohnson - how do you know that a node upstream is not going to change it's output? so even when it looks like a node is done, it may get new inputs. specifically since at this point we are not hashing environments - libraries may change, binaries may change so upstream outputs from a node may change.

or are you simply saying in the expanded graph i will eliminate any nodes that won't change right from the root of the graph? i think this is the sort of thing that should be done (as a configurable option) in the base class before jobs are executed.

hjmjohnson · 2014-12-12T15:42:43Z

@satra I don't have a deep enough understanding of how nippy works. It appears that these changes are allowing my large 7000 jobs to run efficiently on our cluster. Without this, I was overloading the cluster environment with a tremendous number of jobs that only ran to determine if they needed to run, and that caused HUGE performance and cluster penalties.

Regarding the configuration option: I'd be supportive of that, but I don't know how to implement it.

chrisgorgo · 2015-04-09T11:44:11Z

nipype/pipeline/plugins/sgegraph.py

        batch_dir, _ = os.path.split(pyfiles[0])
        submitjobsfile = os.path.join(batch_dir, 'submit_jobs.sh')
+
+        cache_doneness_per_node = dict()
+        if True: ## A future parameter for controlling this behavior could be added here


This is where we can add the config option.

chrisgorgo reviewed Apr 9, 2015
View reviewed changes

chrisgorgo added this to the April 2015 Sprint milestone Apr 9, 2015

chrisgorgo added the enhancement label Apr 9, 2015

chrisgorgo merged commit 34c6674 into nipy:master Apr 9, 2015

hjmjohnson deleted the ImproveSGEGraphSubmissionClusterLoad branch August 12, 2015 18:24

mgxd mentioned this pull request May 22, 2017

MapNode hangs when pipeline run with "SGE" plugin under qsub with version 0.10 #988

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH: Add support for reduced SGEgraph on re-run #1002

ENH: Add support for reduced SGEgraph on re-run #1002

Uh oh!

hjmjohnson commented Nov 21, 2014

Uh oh!

satra commented Nov 22, 2014

Uh oh!

hjmjohnson commented Dec 12, 2014

Uh oh!

chrisgorgo Apr 9, 2015

Uh oh!

Uh oh!

ENH: Add support for reduced SGEgraph on re-run #1002

ENH: Add support for reduced SGEgraph on re-run #1002

Uh oh!

Conversation

hjmjohnson commented Nov 21, 2014

Uh oh!

satra commented Nov 22, 2014

Uh oh!

hjmjohnson commented Dec 12, 2014

Uh oh!

chrisgorgo Apr 9, 2015

Choose a reason for hiding this comment

Uh oh!

Uh oh!