Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single-partition Dask executor for cuDF-Polars #17262
base: branch-24.12
Are you sure you want to change the base?
Single-partition Dask executor for cuDF-Polars #17262
Changes from 37 commits
a590076
7f1bec7
023e085
e7a2fce
69a3374
6aa3694
ea22a9a
915a779
bd9d783
7363d91
58ee5f4
ecc51ef
75eae0c
fb2d6bf
c17564c
6e66998
c41723d
d774f38
6886f8d
29b2d7b
3a68a6d
8079ac0
6f7ccee
af4c5f5
8aed94f
aadaf10
c3a6907
4f67819
c8ca09e
3fd51bb
bf182e4
453e274
2b74f28
6d3cd55
2398a2e
9aa479a
64ea98e
22678a5
41441ca
efadb78
9b78d8f
c54c217
70da7a9
3aeb1e4
485a161
eb41100
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure this is lowering test coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I introduced basic testing for all executors independent of
--executor
pytest argument to ensure 100% coverage always.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 9b78d8f .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be something like
executor=executor or Executor
?Right now, it seems like the
executor
is always ignored.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a leftover from a previous change, I intended to remove the
executor
kwarg. I've done that now in 22678a5, but we may want to change this still depending on how the discussion in #17262 (comment) goes.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a good idea to keep parallel tests here.
With that said, I wonder if it makes sense to somehow run the entire test suite with
executor="dask"
when dask is installed? (not sure how this would work, but all tests should technically work with a single partition)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, apparently tests do work, I just copied a couple for some initial testing, I didn't want to duplicate everything. We do have a few options if we want to test everything:
executor: [None, "dask"]
(None
and"cudf"
both mean the "default" executor);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need to test everything for this specific PR. However, I think it may make sense to go in this direction pretty soon. We will probably want to make sure that single-partition execution continues working for the
entire test suite as multi-partition support is added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wence- @rjzamora I have made the changes we discussed earlier today in 2b74f28 . It adds a new
--executor
pytest command-line argument that has the default value of"cudf"
(default executor) but allows us to run with--executor dask-experimental
(I've also renamed from"dask"
to"dask-experimental"
in c8ca09e, as discussed as well) to rerun the test suite with that executor. The caveat is that to be the least intrusive as possible in the API I had to add anExecutor
variable tocudf_polars.testing.asserts
, which allows us to modify it upon pytest entry in thepytest_configure
function inconftest.py
. The advantage of this approach is we don't need to force the user to always specify theexecutor
toassert_gpu_result_equal
via its API (and thus prevent things like forgetting to pass it), but the obvious downside is the need to modify thecudf_polars.testing.asserts.Executor
module variable which always feels as a bit of a hacky solution.I'm happy to change this to whatever way you feel may suit best, or if you can think of a better solution please let me know too.