-
-
Notifications
You must be signed in to change notification settings - Fork 749
Remove internal dependencies mapping in update_graph #9036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 21 files - 6 21 suites - 6 8h 27m 46s ⏱️ - 2h 38m 8s Results for commit e2284bb. ± Comparison against base commit 01ea1eb. This pull request removes 26 and adds 8 tests. Note that renamed tests count towards both.This pull request skips 12 tests.♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (1)
distributed/scheduler.py:5016
- [nitpick] Consider using an explicit check for key existence (e.g. returning None) instead of using an empty tuple as a default, to improve code clarity and intent.
if tspec := dsk.get(k, ()):
015925f to
d802834
Compare
|
There's a genuine error in |
|
I can reproduce flakyness of |
35a9e8f to
1634935
Compare
|
This now builds on #9068 |
1634935 to
8faeae6
Compare
This is a long overdue follow up of the
Taskclass changes and is aiming to reduce transmission overhead and reduce memory utilization on the scheduler during graph submission.The need to materialize the dependencies as a separate dict is entirely redundant. For backwards compatibility,
_materialize_graphgenerated a materialized version of the dependencies mapping to keep the changes minimal at the time.The caveat was that
Scheduler._remove_done_tasks_from_dskwas actually mutating that view which turns out to be false behavior.dependenciesof a task are immutable and every attempt to treat it differently is very likely corrupting the state or at the very least is altering the graph itself. This argument alone should already suffice to see that changes to dependencies are a bad idea.After closer review of the code, I believe that the entire logic around
_remove_done_tasks_from_dskis actually redundant and can be removed. This method was parsing thedskgraph for tasks that were already inmemoryor were alreadyerredand removed them fromdskaccordingly. The reasoning is sound since we do not want to recompute those tasks again. However, the transition engine is already taking care of this and is producing appropriate recommendations that end up doing nothing.Ultimately, this is a performance tradeoff. This logic in
_remove_done_tasks_from_dskallows us to not throw already computed keys into the transition engine which is arguably slower than the_remove_done_tasks_from_dsklogic itself. Essentially, this makes repeatedpersists faster at the cost of slowing everything else down.In particular,
_remove_done_tasks_from_dskcontains a call toreverse_dictwhich walks the entire graph and all edges and constructs a dict with dependents. Contrary to dependencies, dependents are ephemeral and have to be recomputed every time. This is actually expensive and not doing it is helpful.I haven't done any measuremnets but I strongly suspect this is a win-win (even repeated persists are likely faster).
There may be a couple of wrinkles in CI that I'll have to check on but I'm very confident about the change itself.