-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Optimizations for cudf.concat
when axis=1
#9333
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-21.12 #9333 +/- ##
================================================
- Coverage 10.79% 10.57% -0.22%
================================================
Files 116 116
Lines 18869 19388 +519
================================================
+ Hits 2036 2051 +15
- Misses 16833 17337 +504
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting closer. I have a few suggestions and there's one discussion that we may need to continue a bit offline.
Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor last changes, then I think this is ready to go (pending SWIPAT).
Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
With @vyasr review I think we should be good. Merging in now |
@gpucibot merge |
Thanks @vyasr for patiently reviewing a lengthy PR like this one. |
Fixes: #9223, #9200, #9411
This PR:
RangeIndex
whenaxis=1
.axis=1
cases incudf.concat
, and thus enabling stricter index type checks in associated pytests.distinct_count
value ofColumn
in_distinct_count
to improve performance.Column._clear_cache
to have a single method that clears all the caches values related to aColumn
.Index.union
,Index.intersection
&Index.has_duplicates
.is_numeric
,is_boolean
,is_integer
,is_floating
,is_object
,is_categorical
&is_interval
APIs inIndex
.cudf.concat
foraxis=1
by utilizing above mentioned changes, here are benchmarks:Associated benchmarks are being added here: vyasr/cudf_benchmarks#1