Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔧 slurm batch system improvements #62

Merged
merged 5 commits into from
Jul 9, 2024
Merged

Conversation

juanesarango
Copy link
Contributor

Improvements:

  • Use bash instead of sh
  • SLURM doesn't inherit the ENV variables by default. For most jobs we need to pass along the conventional $TMP, $TMPDIR and/or $TMP_DIR vars.
  • LSF had a nice feature of printing the job metrics at the end of each job. Slurm doesn't. But it has an utility seff <jobid> to do so. Here I add an extra job that is run after the main job finishes and stores the metrics info in a new head_job.slurm file.

Type of PR:

  • 🐛   Bug fix
  • 🚀   New feature
  • 🔧   Code refactor
  • ⚠️   Breaking change that would cause existing functionality to change
  • 📘   I have updated the documentation accordingly
  • ✅   I have added tests to cover my changes

@ddomenico
Copy link
Contributor

Big fan of storing the metrics like this 👍 do you have an example of a job run with this?

src = join(rundir, f"head_job.{j}")
dst = join(root, f"{j}.{index}")
open(src, "w").close()
utils.force_symlink(src, dst)

with open(join(root, "in.sh"), "w") as f:
f.write(f"#!/bin/sh\nbash {root}/in.$SLURM_ARRAY_TASK_ID")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the second bash here redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One is for the shebang. The other one is for execute the file.

#!/bin/bash

bash <command>

I guess it could work without it. as long as the file {root}/in.$SLURM_ARRAY_TASK_ID has the proper shebang too.

Copy link

codecov bot commented Jul 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.13%. Comparing base (e030016) to head (f743e01).
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #62      +/-   ##
==========================================
+ Coverage   95.12%   95.13%   +0.01%     
==========================================
  Files          19       19              
  Lines        2359     2365       +6     
==========================================
+ Hits         2244     2250       +6     
  Misses        115      115              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@juanesarango
Copy link
Contributor Author

Big fan of storing the metrics like this 👍 do you have an example of a job run with this?

Yeah, check this one:

$ cat /data1/papaemme/isabl/data/analyses/22/26/482226/head_job.slurm
Job ID: 1094711
Array Job ID: 1094711_1
Cluster: iris
User/Group: svc_papaemme_bot/svc_papaemme_bot
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 00:00:53
CPU Efficiency: 65.43% of 00:01:21 core-walltime
Job Wall-clock time: 00:01:21
Memory Utilized: 1.59 GB
Memory Efficiency: 1.59% of 100.00 GB

@juanesarango juanesarango merged commit 5507962 into master Jul 9, 2024
8 checks passed
@juanesarango juanesarango deleted the slurm-improvements branch July 9, 2024 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants