-
Notifications
You must be signed in to change notification settings - Fork 31
SPANK plugin #125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
SPANK plugin #125
Conversation
Co-authored-by: Norbert Eicker <n.eicker@fz-juelich.de>
- This saves them from premature killing by psmgmt
Co-authored-by: Norbert Eicker <n.eicker@fz-juelich.de>
Co-authored-by: Norbert Eicker <n.eicker@fz-juelich.de>
- Requires Slurm 23.11 or later or a backport of this function
run exactly once on every proc
610f2bc to
a72d3e0
Compare
|
Modified CI testing to use different container names for the srun/rshlaunch vs. SPANK jobs. While this didn't make any difference for GitHub Actions, using the same names prevented running these jobs simultaneously when running the CI locally through |
a72d3e0 to
cada056
Compare
rountree
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go.
| if (result == -1) { | ||
| sdprintf(1, "ERROR: spindleExitBE returned and error on location %s\n", args.location); | ||
| return -1; | ||
| if (!args.location) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commpath branch gets rid of location. Changing args.location to args.commpath should work, but I'd want to test that out. If your PR goes in first, I can update this in the commpath branch.
| j += env_value_len; | ||
| #if defined(CUSTOM_GETENV) && defined(CUSTOM_GETENV_FREE) | ||
| free(env_value); | ||
| if(env_value != env_value_str) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand how this fixes the underlying problem, but it did take a while to figure out the code paths (which is an issue of the original function, not this fix). Would like the function to be more obviously correct, but this PR probably isn't the place for that.
|
I've found a bug in this when running inside an allocation on fewer nodes than were allocated. Hold off on merging until I can fix it. |
This pull request updates the Slurm SPANK plugin. It includes the commits from #52, for which the original description was:
The following additional changes are made:
parse_location_implwould segfault if a customgetenvwas used because memory would befreed which was not allocated throughmalloc.spindle_args_twas not initialized in the SPANK plugin.isBEProcfunction is adapted to also support selecting a single process per node to send the shutdown message.srun --spindleis used but the Spindle level isoff.RM=slurm-pluginKnown issue: Support for sessions in the SPANK plugin is not implemented and will be addressed in a future PR.