Skip to content

Conversation

@valerian-roche
Copy link
Collaborator

We are regulary encountering issue #23405: nomad client is dropping tasks when they attempt to allocate the last reservable core(s). The issue is encountered when:

  • we do have hosts with allocatable compute larger than the number of reservable cores we grant
  • we have a mix of cpu shared and cpu dedicated tasks on the same host

Nomad starts by only granting the "reservable cores" to the share.slice cpuset. This is unexpected for our usecase. Every time a new task with dedicated cores is started, nomad removes those cores from the share.slice cpuset to the reserve.slice one. This in turn has a few consequences:

  • shared tasks cannot schedule on the same cores as the reserved ones. On the other hand nomad does not use exclusive cores, so a lot of other things can still run on them
  • shared tasks cannot leverage other cores than the ones set as "reservable" to start with. We do not grant all cores as "reservable" for other reasons, and it was not expected that nomad would behave this way
  • when nomad tries to start a task that would use the last reservable cores, the cpuset hook fails (as it would remove the last cores of the cpuset of a cgroup with active processes) and the task is just simply failed (not good)

We do not need this cpuset logic. The cgroup management of the tasks we are running is done in a parallel cgroup hierarchy, and if we want to enforce "exclusive" use we can do it through this mean. As a consequence, this commit fully deactivates the cpuset hook

One consequence of note is that with this change, tasks with shared cpu will potentially run on cores dedicated to tasks if not explicitly set as exclusive.

…ilure

We are regulary encountering issue [hashicorp#23405](hashicorp#23405): nomad client is dropping tasks when they attempt to allocate the last reservable core(s).
The issue is encountered when:
 - we do have hosts with allocatable compute larger than the number of reservable cores we grant
 - we have a mix of cpu shared and cpu dedicated tasks on the same host

Nomad starts by only granting the "reservable cores" to the share.slice cpuset. This is unexpected for our usecase. Every time a new task with dedicated cores is started, nomad removes those cores from the share.slice cpuset to the reserve.slice one.
This in turn has a few consequences:
 - shared tasks cannot schedule on the same cores as the reserved ones. On the other hand nomad does not use exclusive cores, so a lot of other things can still run on them
 - shared tasks cannot leverage other cores than the ones set as "reservable" to start with. We do not grant all cores as "reservable" for other reasons, and it was not expected that nomad would behave this way
 - when nomad tries to start a task that would use the last reservable cores, the cpuset hook fails (as it would remove the last cores of the cpuset of a cgroup with active processes) and the task is just simply failed (not good)

We do not need this cpuset logic. The cgroup management of the tasks we are running is done in a parallel cgroup hierarchy, and if we want to enforce "exclusive" use we can do it through this mean.
As a consequence, this commit fully deactivates the cpuset hook

One consequence of note is that with this change, tasks with shared cpu will potentially run on cores dedicated to tasks if not explicitly set as exclusive.

Signed-off-by: Valerian Roche <valerian.roche@vercel.com>
@valerian-roche valerian-roche merged commit c02b96d into vercel/main Feb 2, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants