[Idea] Could workers sometimes know when to release keys on their own?

In https://github.com/dask/distributed/issues/5083#issuecomment-885972668 I wrote up a theory for how high scheduler load can lead to workers running out of memory, because the scheduler is slow to send them `free-keys` messages, allowing otherwise-releasable data to pile up. Is there a way to make the scheduler less in the critical path for workers to release memory? (This idea probably overlaps a lot with with / is a subset of #4982 and #3974. Also bear in mind that this theory is completely unproven and just something I made up.)

Could we somehow mark tasks as "safe to release", so workers know that when they've completed all the dependents of a task locally, they can release that task, since no other worker (or client) will need the data?

We can't say this at submission time, since we haven't yet scheduled dependencies. (Though tasks with only 1 dependency we could probably eagerly mark as releasable.) But maybe when we assign a task to a worker, we could also look through its immediate dependencies, and any of those that are already assigned to that worker, and have no dependents scheduled on other workers or unscheduled (and not requested by a client), could be marked as releasable.

This could have a nice balanced-budget property, where in many cases the scheduler couldn't hand out new tasks to workers without also giving them some tasks to release (in the future).

cc @fjetter @crusaderky @mrocklin 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Idea] Could workers sometimes know when to release keys on their own? #5114

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Idea] Could workers sometimes know when to release keys on their own? #5114

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions