Skip to content

Errors in logfile due to login node config #115

Closed
@sjpb

Description

@sjpb

Errors in logfile like:

Sep 17 09:29:28 alaska-control slurmctld[208988]: error: _slurm_rpc_node_registration node=alaska-login-0: Invalid argument

Is because partitions define a default node with details, e.g.:

NodeName=DEFAULT State=UNKNOWN \
    RealMemory=106897 \
    Sockets=2 \
    CoresPerSocket=15 \
    ThreadsPerCore=2

but we don't write a new DEFAULT for login nodes. So if they don't match the last compute partition, there is a mismatch on registration.

Can't be fixed by adding a NodeName=DEFAULT before the login node definition.

Can be fixed by putting login-node definitions BEFORE the first DEFAULT definition. Suggest:

# LOGIN-ONLY NODES
# Define slurmd nodes not in partitions for configless login-only nodes:
<templating>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions