-
-
Notifications
You must be signed in to change notification settings - Fork 734
Edge and impossible transitions to memory #7205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
distributed/tests/test_steal.py
Outdated
await ev.wait() | ||
s = next(si for si in Scheduler._instances if id(si) == sched_id) | ||
# Note that the task is async, so it runs in the same thread as the scheduler | ||
s.reschedule("x-0", stimulus_id="steal") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As pointed out in #7200 (comment)
this is not how stealing works and using Scheduler.reschedule
in this way is not intended use. If this is the only way to provoke this transition, we should remove Scheduler.reschedule
(or rather make it private, users should raise the Reschedule
exception.
I'd like us to add a more realistic reproducer to show this can actually occur before adding this transition |
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ± 0 15 suites ±0 6h 14m 13s ⏱️ + 2m 57s For more details on these failures, see this check. Results for commit 8fb3dab. ± Comparison against base commit c137ac0. ♻️ This comment has been updated with latest results. |
7dd2458
to
d2944ef
Compare
Discussion on #7200: I'd like to remove the test altogether, merge the change untested, and open a follow-up for a proper investigation and coverage of all use cases of unexpected transitions to memory. |
Follow-up issue: |
Co-authored-by: Gabe Joseph <gjoseph92@gmail.com>
All test failures are unrelated. Ready for review and merge. |
distributed/scheduler.py
Outdated
("queued", "released"): transition_queued_released, | ||
("queued", "processing"): transition_queued_processing, | ||
("processing", "released"): transition_processing_released, | ||
("processing", "memory"): transition_processing_memory, | ||
("processing", "erred"): transition_processing_erred, | ||
("no-worker", "released"): transition_no_worker_released, | ||
("no-worker", "processing"): transition_no_worker_processing, | ||
("no-worker", "memory"): transition_no_worker_memory, | ||
("no-worker", "memory"): partial(impossible_transition, "memory"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the value of having a transition function for an impossible transition, versus updating
distributed/distributed/scheduler.py
Line 1853 in 5a14053
raise RuntimeError(f"Impossible transition from {start} to {finish}") |
to raise a more detailed error like you have here (including the story)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to avoid the error from being silenced in the future if anybody adds a (released, memory) transition. But on second thought I'll just overhaul the above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for figuring all this out
Only failing: |
Uh oh!
There was an error while loading. Please reload this page.