Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement HLG layer for P2P rechunking #8751

Merged
merged 14 commits into from
Jul 18, 2024

Conversation

hendrikmakait
Copy link
Member

Closes #8676

  • Tests added / passed
  • Passes pre-commit run --all-files

Copy link
Contributor

github-actions bot commented Jul 5, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    29 files  ± 0      29 suites  ±0   11h 57m 8s ⏱️ + 5m 16s
 4 088 tests + 1   3 972 ✅ + 3    112 💤 ±0  4 ❌  - 2 
55 299 runs  +13  52 856 ✅ +15  2 439 💤 +1  4 ❌  - 3 

For more details on these failures, see this check.

Results for commit 3703897. ± Comparison against base commit ffcb271.

♻️ This comment has been updated with latest results.


from dask.array.rechunk import old_to_new

_old_to_new = old_to_new(self.chunks_input, self.chunks)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll run some tests to figure out if this is bad, from what I understand, we should only cull once, i.e., this would get called twice in total.

distributed/shuffle/tests/test_rechunk.py Show resolved Hide resolved
distributed/shuffle/_rechunk.py Outdated Show resolved Hide resolved
distributed/shuffle/_rechunk.py Show resolved Hide resolved
distributed/shuffle/_rechunk.py Outdated Show resolved Hide resolved
Comment on lines +350 to +352
def cull(
self, keys: set[Key], all_keys: Collection[Key]
) -> tuple[P2PRechunkLayer, dict]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this... it feels like we're doing the same work twice. I know that the culling API forces us to generate this dict but it is still odd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I'm still trying to figure out if there's a better way to avoid work here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also worried about how this will look once we move to expressions where we no longer have the culling as such a concept. However, array expressions are in such an early stage of development that this hasn't been dealt with, yet, from what I can tell.

@phofl any insights by any chance (might also be easier to hop on a call)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah call makes sense

hendrikmakait and others added 2 commits July 17, 2024 15:22
Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com>
@hendrikmakait hendrikmakait marked this pull request as ready for review July 18, 2024 06:57
@hendrikmakait hendrikmakait marked this pull request as draft July 18, 2024 06:58
@hendrikmakait
Copy link
Member Author

An A/B test shows no significant impact on performance:
Screenshot 2024-07-18 at 15 58 05

@hendrikmakait hendrikmakait marked this pull request as ready for review July 18, 2024 13:59
@hendrikmakait hendrikmakait merged commit 651ba5a into dask:main Jul 18, 2024
28 of 36 checks passed
@hendrikmakait hendrikmakait deleted the p2p-rechunking-hlg branch July 18, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

P2P Rechunking - Graph submission time reduction - HLG Layer for P2P rechunking
3 participants