Skip to content

Conversation

@fjetter
Copy link
Member

@fjetter fjetter commented Apr 8, 2025

This explicit culling step is for many reasons redundant.

  1. DataFrame expressions behave well and generate only what's needed. Legacy Array collections perform this as a low level optimization, etc.
  2. The Scheduler._generate_taskstate method is already walking the graph in a way that automatically culls. Therefore, this optimization step is only there to cull the graph before it enters dask.order

I believe the only path where this even has a chance to trigger is if the Client.get receives a raw dictionary and in these cases we have zero information about what happened to the graph. It still makes sense to cull to avoid worst case runtimes but for all other cases this isn't necessary, I believe.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 8, 2025

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    26 files   -     1      26 suites   - 1   10h 56m 24s ⏱️ - 29m 16s
 4 106 tests ±    0   3 988 ✅  -     2    112 💤 ±  0  6 ❌ +3 
48 809 runs   - 2 670  46 614 ✅  - 2 551  2 189 💤  - 121  6 ❌ +3 

For more details on these failures, see this check.

Results for commit 4378728. ± Comparison against base commit 6691e27.

♻️ This comment has been updated with latest results.

@fjetter fjetter merged commit 9cec018 into dask:main Apr 15, 2025
29 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant