Skip to content

ENH: Add support for reduced SGEgraph on re-run #1002

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 9, 2015
Merged

ENH: Add support for reduced SGEgraph on re-run #1002

merged 1 commit into from
Apr 9, 2015

Conversation

hjmjohnson
Copy link
Contributor

During the re-running of a workflow with SGEgraph, we can greatly
reduce the dependancies by determining which nodes have completed
prior to generating the submit-graph.

Job numbers need to be based on job ID's not on job names. Job names are not unique.

BUG: Job names must be valid bash variables.

During the re-running of a workflow with SGEgraph, we can greatly
reduce the dependancies by determining which nodes have completed
prior to generating the submit-graph.

Job numbers need to be based on job ID's not on job names.  Job names are not unique.

BUG: Job names must be valid bash variables.
@satra
Copy link
Member

satra commented Nov 22, 2014

@hjmjohnson - how do you know that a node upstream is not going to change it's output? so even when it looks like a node is done, it may get new inputs. specifically since at this point we are not hashing environments - libraries may change, binaries may change so upstream outputs from a node may change.

or are you simply saying in the expanded graph i will eliminate any nodes that won't change right from the root of the graph? i think this is the sort of thing that should be done (as a configurable option) in the base class before jobs are executed.

@hjmjohnson
Copy link
Contributor Author

@satra I don't have a deep enough understanding of how nippy works. It appears that these changes are allowing my large 7000 jobs to run efficiently on our cluster. Without this, I was overloading the cluster environment with a tremendous number of jobs that only ran to determine if they needed to run, and that caused HUGE performance and cluster penalties.

Regarding the configuration option: I'd be supportive of that, but I don't know how to implement it.

batch_dir, _ = os.path.split(pyfiles[0])
submitjobsfile = os.path.join(batch_dir, 'submit_jobs.sh')

cache_doneness_per_node = dict()
if True: ## A future parameter for controlling this behavior could be added here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where we can add the config option.

@chrisgorgo chrisgorgo added this to the April 2015 Sprint milestone Apr 9, 2015
@chrisgorgo chrisgorgo merged commit 34c6674 into nipy:master Apr 9, 2015
@hjmjohnson hjmjohnson deleted the ImproveSGEGraphSubmissionClusterLoad branch August 12, 2015 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants