Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make decide_worker and rootish logic private #8457

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

hendrikmakait
Copy link
Member

Baby-steps toward #8190

  • Tests added / passed
  • Passes pre-commit run --all-files

Comment on lines -133 to 139
.. autosummary:: Scheduler.decide_worker_non_rootish
.. autosummary:: Scheduler._decide_worker_non_rootish

.. autosummary:: Scheduler.decide_worker_rootish_queuing_disabled
.. autosummary:: Scheduler._decide_worker_rootish_queuing_disabled

.. autosummary:: Scheduler.decide_worker_rootish_queuing_enabled
.. autosummary:: Scheduler._decide_worker_rootish_queuing_enabled

.. autosummary:: Scheduler.worker_objective
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should nuke this entire document. definitely the reference to the methods

Copy link
Member Author

@hendrikmakait hendrikmakait Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care either way. If the interested user wants to read through those implementation details, sure, have at it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...though I do see the challenge of keeping this up-to-date (which I'm not sure it currently is, and frankly, I'm also not going to read it in full and compare it to the status quo today)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a follow-up issue in case someone else has strong opinions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

github-actions bot commented Jan 12, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    29 files  ±0      29 suites  ±0   11h 37m 55s ⏱️ + 5m 27s
 4 064 tests ±0   3 960 ✅ +3     97 💤 ±0   7 ❌  - 3 
55 981 runs  ±0  53 808 ✅ +3  2 163 💤 ±0  10 ❌  - 3 

For more details on these failures, see this check.

Results for commit cc90683. ± Comparison against base commit 97dbdaa.

♻️ This comment has been updated with latest results.

@barney54321
Copy link

I'm a bit late to the party but have only just noticed this change.

I'm working on a project where we override the rootish behaviour because tasks that have resource restrictions are not considered rootish, which (at least historically) results in them not being eligible for queuing and having very different scheduling behaviour, especially with dynamic workers (such as with a multi-spec adaptive cluster). The behaviour we observed when run with our use cases without our modification was that the first worker to spawn would receive every root task in the graph, with work stealing then balancing the workers. This in our experience would often lead to large idles times and sometimes even infinite loops deadlocking the scheduler. To overcome this we override rootish to always be true, disable work stealing and attach some heuristic worker-task assignments, as we found that for our use cases the resulting behaviour would have good load balancing and minimal idle times.

Given rootish has been made private, is there a public API that we can use to achieve the same behaviour as before? I see that TaskState has an attribute _rootish that is used in _is_rootish() and is referenced in the docstrings as being a means to override the behaviour, however it is both private and set to None in the __init__.

@hendrikmakait
Copy link
Member Author

I'm a bit late to the party but have only just noticed this change.

I'm working on a project where we override the rootish behaviour because tasks that have resource restrictions are not considered rootish, which (at least historically) results in them not being eligible for queuing and having very different scheduling behaviour, especially with dynamic workers (such as with a multi-spec adaptive cluster).

This is still the case, tasks with worker restrictions are currently never considered rootish.

The behaviour we observed when run with our use cases without our modification was that the first worker to spawn would receive every root task in the graph, with work stealing then balancing the workers. This in our experience would often lead to large idles times and sometimes even infinite loops deadlocking the scheduler. To overcome this we override rootish to always be true, disable work stealing and attach some heuristic worker-task assignments, as we found that for our use cases the resulting behaviour would have good load balancing and minimal idle times.

The fact that you encounter deadlocks is alarming to me. If you can create a reproducer, I'd be interested in taking a look at that. While idle times may be expected depending on the workload and setup, deadlocks should not.

Given rootish has been made private, is there a public API that we can use to achieve the same behaviour as before? I see that TaskState has an attribute _rootish that is used in _is_rootish() and is referenced in the docstrings as being a means to override the behaviour, however it is both private and set to None in the __init__.

@barney54321: There is currently no API that I would consider public that allows you to override the scheduling behavior or rootish behavior. These APIs are rather fragile and changing them can easily result in deadlock or other undesirable behavior, so we generally do not encourage users to touch them. They are also subject to change without notice.

That being said, you could continue to override these APIs at your own risk. Historically, Dask has done a poor job at signaling which APIs are public vs. private. With this PR, we do not actively make a previously public API private. Instead, we make it explicit that these APIs have been considered private all along.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants