Description
In #12319 we fixed a very old bug where when an allocation failed the scheduler failed to find a placement, the reschedule tracker was dropped. While working with @pkazmierczak on #24869 we discovered this bug was not 100% fixed. In case where the node is down and the allocation is marked lost
, we're somehow not propagating the reschedule tracker.
Reproduction
To demonstrate both the behavior that works and the non-working behavior, I'm deploying to a 1 server + 1 client cluster (current tip of main
aka 1.9.6-dev), with the following jobspec. This jobspec has disabled restarts and a constraint
block that allows us to control whether or not placement works.
jobspec
job "example" {
group "group" {
reschedule {
attempts = 30
interval = "24h"
unlimited = false
}
restart {
attempts = 0
mode = "fail"
}
constraint {
attribute = "${meta.example}"
operator = "="
value = "1"
}
task "task" {
driver = "docker"
config {
image = "busybox:1"
command = "httpd"
args = ["-vv", "-f", "-p", "8001", "-h", "/local"]
}
resources {
cpu = 100
memory = 100
}
}
}
}
Apply the following node metadata to the node:
$ nomad node status
ID Node Pool DC Name Class Drain Eligibility Status
e6e43a5a default philly-1 client0 multipass false eligible ready
$ nomad node meta apply --node-id e6e43a5a example=1
Run the job.
Normal Recheduling
Kill the task (via docker kill
) to force a reschedule.
$ nomad alloc status 4d64f58c
...
Recent Events:
Time Type Description
2025-01-22T15:13:20-05:00 Not Restarting Policy allows no restarts
Wait for the allocation to be rescheduled and see that the replacement has a reschedule tracker.
$ nomad job status example
...
Allocations
ID Node ID Task Group Version Desired Status Created Modified
1914d5a9 e6e43a5a group 0 run running 3s ago 2s ago
4d64f58c e6e43a5a group 0 stop failed 1m14s ago 3s ag
$ nomad operator api "/v1/allocation/1914d5a9-3610-75a9-025d-729a9dbed06c" | jq .RescheduleTracker
{
"Events": [
{
"Delay": 30000000000,
"PrevAllocID": "4d64f58c-96cc-8465-82ba-e48241dbdba6",
"PrevNodeID": "e6e43a5a-9ddb-d65a-521a-cde19f093656",
"RescheduleTime": 1737576830218453000
}
],
"LastReschedule": "ok"
}
Failed Rescheduling with Correct Behavior
Now we'll change the node metadata to make the node ineligible:
$ nomad node meta apply --node-id e6e43a5a example=2
Kill the task again to force a reschedule, and wait for the blocked eval:
$ nomad eval list
ID Priority Triggered By Job ID Namespace Node ID Status Placement Failures
5db8c171 50 queued-allocs example default <none> blocked N/A - In Progress
1b751548 50 alloc-failure example default <none> complete true
...
Update the node metadata to unblock the eval
$ nomad node meta apply --node-id e6e43a5a example=1
And wait for the node update eval.
$ nomad eval list
ID Priority Triggered By Job ID Namespace Node ID Status Placement Failures
6eac73f2 50 node-update example default e6e43a5a complete false
5db8c171 50 queued-allocs example default <none> complete false
1b751548 50 alloc-failure example default <none> complete true
...
The replacement allocation has a reschedule tracker as we expect, which is what we fixed in #12319.
$ nomad job status example
...
Allocations
ID Node ID Task Group Version Desired Status Created Modified
1a99a69c e6e43a5a group 0 run running 23s ago 13s ago
1914d5a9 e6e43a5a group 0 stop failed 3m54s ago 23s ago
4d64f58c e6e43a5a group 0 stop failed 5m5s ago 3m54s ago
$ nomad operator api "/v1/allocation/1a99a69c-55bf-ddee-0c6d-6e54222b90bf" | jq .RescheduleTracker
{
"Events": [
{
"Delay": 30000000000,
"PrevAllocID": "4d64f58c-96cc-8465-82ba-e48241dbdba6",
"PrevNodeID": "e6e43a5a-9ddb-d65a-521a-cde19f093656",
"RescheduleTime": 1737576830218453000
},
{
"Delay": 60000000000,
"PrevAllocID": "1914d5a9-3610-75a9-025d-729a9dbed06c",
"PrevNodeID": "e6e43a5a-9ddb-d65a-521a-cde19f093656",
"RescheduleTime": 1737577040806473200
}
],
"LastReschedule": "ok"
}
Reschedule on Downed Node
Now halt the node (sudo systemctl stop nomad
), and wait for it to be marked down.
$ nomad node status
ID Node Pool DC Name Class Drain Eligibility Status
e6e43a5a default philly-1 client0 multipass false eligible down
Wait for the blocked evaluation:
$ nomad job status example
...
Placement Failure
Task Group "group":
* No nodes were eligible for evaluation
Allocations
ID Node ID Task Group Version Desired Status Created Modified
1a99a69c e6e43a5a group 0 stop lost 2m43s ago 23s ago
1914d5a9 e6e43a5a group 0 stop failed 6m14s ago 2m43s ago
4d64f58c e6e43a5a group 0 stop failed 7m25s ago 6m14s ago
$ nomad eval list
ID Priority Triggered By Job ID Namespace Node ID Status Placement Failures
17784deb 50 queued-allocs example default <none> blocked N/A - In Progress
f34b6262 50 node-update example default e6e43a5a complete true
...
Then restart the node and wait for the allocation to be unblocked:
$ nomad eval list
ID Priority Triggered By Job ID Namespace Node ID Status Placement Failures
40652e21 50 node-update example default e6e43a5a complete false
4e69a3fe 50 queued-allocs example default <none> complete false
9b5ed7fd 50 node-update example default e6e43a5a complete true
...
The allocation has been replaced but the replacement allocation doesn't have a reschedule tracker!
$ nomad job status example
...
Allocations
ID Node ID Task Group Version Desired Status Created Modified
3896afa8 e6e43a5a group 0 run running 19s ago 9s ago
1a99a69c e6e43a5a group 0 stop complete 4m17s ago 14s ago
1914d5a9 e6e43a5a group 0 stop failed 7m48s ago 4m17s ago
4d64f58c e6e43a5a group 0 stop failed 8m59s ago 7m48s ago
$ nomad operator api "/v1/allocation/3896afa8-c58b-f436-b4e9-3c5bb733f0b0" | jq .RescheduleTracker
null