-
Notifications
You must be signed in to change notification settings - Fork 972
Add MemoryResourceConfig to cudf-polars config #20042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.12
Are you sure you want to change the base?
Add MemoryResourceConfig to cudf-polars config #20042
Conversation
This adds a new configuration option to cudf-polars' config, allowing users to specify the memory resource to create by default. Currently, users either get the default behavior (typically a managed memory resource) or can pass in a concrete memory resource. In some cases, you might want to pass in a description of the memory resource to use: 1. In our unit tests, we might want to specify a CudaAsyncMemoryResource with a relatively small initial pool size 2. In a distributed environment, you can't pass around concrete memory resource objects (which can't be serialized)
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Examples | ||
-------- | ||
>>> MemoryResourceConfig( | ||
... qualname="rmm.mr.CudaAsyncMemoryResource", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are people OK with "qualname" here? I want to avoid locking us to MRs that happen to be defined in rmm.mr
.
The only other feature I might want to add here is expanding the For example, to express our default memory resource:
Aside from {
"qualname": "rmm.mr.PrefetchResourceAdaptor",
"options": {
"upstream_mr": {
"qualname": "rmm.mr.PoolMemoryResource",
"options": {
"upstream_mr": {
"qualname": "rmm.mr.ManagedMemoryResource",
"options": {
"initial_pool_size": 256
}
}
}
}
}
} That relies on pattern matching a dict with |
(cherry picked from commit 29b3b41)
Now that cudf-polars uses managed memory by default, the prior comment here should no longer be applicable and we should be able to run these tests with more than 1 process for a hopeful improvement in runtime. Probably depends on #20042 so each xdist process doesn't set the `initial_pool_size` of the memory resource to 80% of the available device memory. Authors: - Matthew Roeschke (https://github.com/mroeschke) - Tom Augspurger (https://github.com/TomAugspurger) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Bradley Dice (https://github.com/bdice) - Tom Augspurger (https://github.com/TomAugspurger) URL: #19980
We really only need to see why tests fail when they fail. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Murray (https://github.com/Matt711) - Bradley Dice (https://github.com/bdice) URL: rapidsai#20107
…i#20102) Contributes to rapidsai#15170 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#20102
…idsai#20101) Contributes to rapidsai#15170 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#20101
…dsai#20099) Contributes to rapidsai#15170 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#20099
This pull-request adds a JIT filter for the `read_parquet` filtering. JIT Filter for `read_parquet` can be turned on using the `use_jit_filter` reader option or the `LIBCUDF_USE_JIT` environment variable. It also adds a benchmark for `read_parquet` to compare the AST and JIT filters. Benchmark results will be posted below. Follows-up rapidsai#18023 Authors: - Basit Ayantunde (https://github.com/lamarrr) - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Bradley Dice (https://github.com/bdice) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: rapidsai#19831
This PR cleans up the custom device atomic logic by using `atomic_ref`. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) URL: rapidsai#19924
) This PR uses 8 processes instead of 6, hoping to cut the runtime of the pandas test suite. I have data below, this speeds things up by about 21.6%. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#20109
This PR is needed to support the nvbench upgrade in rapidsai/rapids-cmake#895. This should be merged immediately after. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#19619
Contributes to rapidsai#15170 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#20119
…i#20038) Fixes several groupby benchmarks that only measured the aggregate/scan/shift call where a sort-based groupby is used. This means only the 1st call performed a keys sort while the remaining invocations reused the already sorted keys. The change involves simply instantiating the groupby object within the `state.exec{}` functor along with the `aggregate` call. This change also adds decimal64 to the sum benchmark to show improvement on a follow on PR. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Shruti Shivakumar (https://github.com/shrshi) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#20038
Minor cleanup of aggregation code including fixing a misspelling. No functional changes. Found while working on rapidsai#20040 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) URL: rapidsai#20053
…9980) Now that cudf-polars uses managed memory by default, the prior comment here should no longer be applicable and we should be able to run these tests with more than 1 process for a hopeful improvement in runtime. Probably depends on rapidsai#20042 so each xdist process doesn't set the `initial_pool_size` of the memory resource to 80% of the available device memory. Authors: - Matthew Roeschke (https://github.com/mroeschke) - Tom Augspurger (https://github.com/TomAugspurger) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Bradley Dice (https://github.com/bdice) - Tom Augspurger (https://github.com/TomAugspurger) URL: rapidsai#19980
Description
This adds a new configuration option to cudf-polars' config, allowing users to specify the memory resource to create by default.
Currently, users either get the default behavior (typically a managed memory resource) or can pass in a concrete memory resource (see the new docs added in this PR).
In some cases, you might want to pass in a description of the memory resource to use:
This is strictly more flexible than the current option of setting
POLARS_GPU_ENABLE_CUDA_MANAGED_MEMORY
. By settingPOLARS_GPU_ENABLE_CUDA_MANAGED_MEMORY=0
you'll get aCudaAsyncMemoryResource
with its defaultinitial_pool_size
andrelease_threshold
. With this system, you can setto get the same thing, or
to configure the pool.
I'd recommend deprecating
POLARS_GPU_ENABLE_CUDA_MANAGED_MEMORY
to reduce the number of ways this can be configured, though we should take our time with that.