Skip to content

Root-task withholding without co-assignment #6631

@fjetter

Description

@fjetter

We had an early attempt to experiment with root-task withholding to address the problem of root-task-overproduction. Below a couple of links with additional information (non-exhaustive)

We started an experimentation trying to withhold worker assignment for root tasks, i.e. delay worker assignment scheduler side, see #6560

Early prototypes show very promising results that should improve our cluster memory footprint. A prototype is available at #6614 (and should be ready to try for curious users)

Given that the current co-assignment logic has some significant shortcomings (e.g. #6597) and the withholding of root-tasks appears to be sufficient to control our memory footprint (some experimentation on configuration is still required) we should get the root-task withhold logic in a production ready, i.e. merge-able state and get rid of the current co-assignment logic.

This should be verified by thorough performance benchmark results, for this, see coiled/benchmarks#191 for work on automated benchmarks.

Once this is solid, we may consider adding a more robust co-assignment logic in a follow up step, if necessary.

AC

  • The prototype PR is merged and the new assignment logic is hidden behind a feature toggle
  • The feature toggle is disabled by default
  • There is a CI job with an experimental flag running on ubuntu on a single python version that has this feature toggle enabled. All failing tests are specifically marked and are allowed to be skipped on this job.
  • A follow up ticket with an overview of all skipped tests is created

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions