Closed
Description
Errors in logfile like:
Sep 17 09:29:28 alaska-control slurmctld[208988]: error: _slurm_rpc_node_registration node=alaska-login-0: Invalid argument
Is because partitions define a default node with details, e.g.:
NodeName=DEFAULT State=UNKNOWN \
RealMemory=106897 \
Sockets=2 \
CoresPerSocket=15 \
ThreadsPerCore=2
but we don't write a new DEFAULT for login nodes. So if they don't match the last compute partition, there is a mismatch on registration.
Can't be fixed by adding a NodeName=DEFAULT before the login node definition.
Can be fixed by putting login-node definitions BEFORE the first DEFAULT definition. Suggest:
# LOGIN-ONLY NODES
# Define slurmd nodes not in partitions for configless login-only nodes:
<templating>
Metadata
Metadata
Assignees
Labels
No labels