Skip to content

Propagation of specific Slurm variables #3606

@brandongc

Description

@brandongc

Issue #3422 and the resulting fix now strip all SBATCH_* variables out of the env.

In our case we are developing a containerized environment that propagates the submit to run environment "seamlessly" via S[BATCH|RUN|ALLOC]_CONTAINER (https://dl.acm.org/doi/10.1145/3731599.3767355).

The current workflow is to run reframe -r from within the container environment on a login node and then it should propagate to all the jobs submitted via reframe.

This results in having to do something like the following config:

IMAGE = os.environ["NERSC_IMAGE"]

zen3_a100_ofi = {
    "name": "zen3-a100-ofi",
    "descr": "Submit jobs through the system Slurm scheduler",
    "scheduler": "slurm",
    "launcher": "srun",
    "access": ["--qos=regular", "--constraint=gpu", f"--container={IMAGE}"],
    "environs": ["builtin", "prgenv-gnu"],
}

The resulting job scripts then have this explicitly, but their sruns do not which leads to a potential divergence of behavior if one is keeping the staged files and inspecting those manually.

Ideally Slurm will have a better design for how these options interact, but for now it would be useful to us to have some ability to control which variables are removed. For example: a simple exclude list or ability to override a regular expression

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions