-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Wrapper to Support Pegasus 5.0.0 #73
Merged
Merged
Changes from 13 commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
82bffd1
Initial Refactor of codebase to adapt changes for Pegasus 5.0.0
lichtefeld 2ff43d3
Fix remaining unit test errors - Bringing the tests to use YAML parse…
lichtefeld 41eac98
Change to automaticly locating the home dir
lichtefeld 0370cc8
Only output one submit script in example workflow. We generate one wi…
lichtefeld d62365d
Update associated documentation. Remove old NAS related restriction.
lichtefeld f82e46b
Fix precommit issues
lichtefeld 908a8a0
Update how we add properties
lichtefeld da5dc1b
Fix precommit & test errors
lichtefeld 3c179a4
Don't provide a default partition in saga cluster site configuration.
lichtefeld 53895dd
Remove now unused resources
lichtefeld ff17129
Update requirements
lichtefeld 9f29a66
Edit readme to reference using a shared nas location as the root dir …
lichtefeld 6b85225
Allow modifying the home_dir location via a parameter, otherwise defa…
lichtefeld 2455bed
Update docstrings
lichtefeld File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,3 @@ | ||
<!-- | ||
[![Build status](https://ci.appveyor.com/api/projects/status/3jhdnwreqoni1492/branch/master?svg=true)](https://ci.appveyor.com/project/isi-vista/vista-pegasus-wrapper/branch/master) | ||
--> | ||
[![Build status](https://travis-ci.com/isi-vista/vista-pegasus-wrapper.svg?branch=master)](https://travis-ci.com/isi-vista/vista-pegasus-wrapper?branch=master) | ||
[![codecov](https://codecov.io/gh/isi-vista/vista-pegasus-wrapper/branch/master/graph/badge.svg)](https://codecov.io/gh/isi-vista/vista-pegasus-wrapper) | ||
|
||
|
@@ -28,25 +25,27 @@ This library simplifies the process of writing a profile which can be converted | |
Using [WorkflowBuilder from `workflow.py`](pegasus_wrapper/workflow.py) develop a function to generate a `Workflow.dax`. | ||
See [example_workflow](pegasus_wrapper/scripts/example_workflow_builder.py) for an extremely simple workflow which we will use to demonstrate the process. | ||
To see the example workflow add a `root.params` file to the parameters directory with the following: | ||
*Note the Directory should be in your $Home and not a NFS like /nas/gaia/ as the submission will fail for an NFS reason* | ||
``` | ||
example_root_dir: "path/to/output/dir/" | ||
conda_environment: "pegasus-wrapper" | ||
conda_base_path: "path/to/conda" | ||
``` | ||
run `python -m pegasus_wrapper.scripts.example_workflow_builder parameters/root.params` from this project's root folder. | ||
|
||
The log output will provide you the output location of the `Text.dax` Assuming you are logged into a submit node with an active Pegasus install: | ||
|
||
``` | ||
cd "path/to/output/dir" | ||
pegasus-plan --conf pegasus.conf --dax Test.dax --dir "path/to/output/dir" --relative-dir exampleRun-001 | ||
pegasus-run "path/to/output/dir/"exampleRun-001 | ||
./submit.sh | ||
``` | ||
The example workflow submits **ONLY** to `scavenge`. In an actual workflow we would recommend parameterizing it. | ||
|
||
Our current system places `ckpt` files to indicate that a job has finished in the event the DAX needs to be generated again to fix a bug after an issue was found. This system is non-comprehensive as it currently requires manual control. When submitting a new job using previous handles use a new relative dir in the plan and run. | ||
|
||
A [Nuke Checkpoints](scripts/nuke_checkpoints.py) script is provided for ease of removing checkpoint files. To use, pass a directory location as the launch parameter and the script will remove checkpoint files from the directory and all sub-directories. | ||
|
||
It is recommended to use a shared directory on the NAS, e.g. `/nas/gaia` to host a workflow under as compared to a users `/nas/user/xyz` home directory due to space limitations on the NAS. | ||
|
||
# FAQ | ||
## How can I exclude some nodes? | ||
|
||
|
@@ -57,11 +56,6 @@ Use run_on_single_node parameter when you initialize a workflow (or a Slurm reso | |
* Note you cannot use this option with the **exclude_list** option. | ||
* Note you cannot specify more than one node using this option. | ||
|
||
## What are valid root directories for the workflow? | ||
|
||
Currently the root directory should be be in your home directory and not on an NAS like `/nas/gaia/` as the submission will fail for an NFS reason. | ||
The experiment directory can be (and ought to be) on such a drive, though. | ||
|
||
# Common Errors | ||
|
||
## Mismatching partition selection and max walltime | ||
|
@@ -72,7 +66,8 @@ Partitions each have a max walltime associated with them. See the saga cluster w | |
|
||
If you change code while a pipeline is runnning, the jobs will pick up the changes. This could be helpful if you notice an error and fix it before that code runs, but can also lead to some unexpected behavior. | ||
|
||
## `No module named 'Pegasus'` | ||
## `No module named 'Pegasus'` (Version 4.9.3) | ||
*This is believed to have been fixed for Pegasus Version 5. If this arrises please leave an issue* | ||
|
||
This is a weird one that pops up usually when first getting set up with Pegasus. First, if you see this please contact one of the maintainers (currently @joecummings or @spigo900). To fix this, install the following packages with these commands in this exact order - they are dependent on each other. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can remove this fix. Comment you added is enough. |
||
1. `pip install git+https://github.com/pegasus-isi/pegasus/#egg=pegasus-wms.common&subdirectory=packages/pegasus-common` | ||
|
@@ -85,6 +80,10 @@ This is a weird one that pops up usually when first getting set up with Pegasus. | |
|
||
A new node gotten with `srun` does not load the Spack modules you usually have set up in your runtime scripts. You need to manually install these if you want to work with Tensorflow or anything requiring Cuda. | ||
|
||
# Updating from wrapper script to use Pegasus5.0.0 from Pegasus4.9.3 | ||
|
||
No changes should be needed for any project using the previous version of the wrapper which supported Pegasus4.9.3. | ||
|
||
# Contributing | ||
|
||
Run `make precommit` before commiting. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any restrictions on valid root dirs? Should the README include a statement that root dirs can now be on shared /nas/gaia or /nas/user/xyz ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no longer any restrictions on valid root dirs. There used to be a NFS related bug when trying to mount a shared NAS as the root dir for Pegasus but it has been resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielnapierski I've added a note to the readme that the
root_dir
can be set to any location on the NAS, and that a shared drive e.g./nas/gaia
may be preferred due to space limitations on the user home dirs. However if unspecified the script will default to a users/nas/home/xyz/
dir.