-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check column type equality, handling nested types correctly. #14531
Check column type equality, handling nested types correctly. #14531
Conversation
@wence- Is this the kind of change you had in mind? I looked over the list of files you put in #14527 and made changes for most of them. I'd like your thoughts on what level of testing we need to merge this PR. Should we test every algorithm with mismatching inputs that have the same top-level type, like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good step, although please check the logic in your implementation of all_columns_equal
.
I think if we don't want to change the structure of the
I think we want some unit tests for the equality comparisons first. Since the data-type description is an inductive type, the tests can be "exhaustive" by checking the base cases and the inductive cases, and then convincing ourselves we've implemented the inductive definition correctly. We should also then test that the table-equality comparators work correctly. In terms of then testing the advertised API surface, can we collect which functions advertise that they throw on mismatched types and then consider the testing strategy. One thing I wonder is whether it should be the case that any such type-equality checking happens in the detail API, or if those functions have as (implicit) preconditions that they are passed valid data. We can't encode any such requirement in the type system because columns are type-erased: the best I can think of is that detail APIs that require matching types gain an argument that carries a token indicating whether equality constraints have been satisfied by the caller. But that might not be very ergonomic. The thing I would like to avoid is redoing equality checks when the caller of some function has already performed them. |
… table_view.cpp (#14535) Moves some `inline` functions from the `table_view.hpp` to the `table_view.cpp` to help reduce dependency bloat. Also reference #14531: I'd like to minimize changes to the highly inclusive `table_view.hpp`. Closes #14431 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Karthikeyan (https://github.com/karthikeyann) - Mike Wilson (https://github.com/hyperbolic2346) URL: #14535
One extra caveat I'd like to consider here. We are planning for STRING type columns to contain either INT32 or INT64 offsets children. In this case, I think we'd want two STRING type columns to be equal (in type only) regardless of their children types. |
d92c5ea
to
39d14c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I have some very minor (all non-blocking) comments that are mostly stylistic choices (feel free to ignore).
Co-authored-by: Lawrence Mitchell <wence@gmx.li>
…o nested-column-type-checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wence- Thanks for the great review comments. I am slowly working through this as I have time. I had a couple replies to your comments, and I'll ping you when I'm ready for another review.
Co-authored-by: Lawrence Mitchell <wence@gmx.li>
…o nested-column-type-checks
@wence- I applied most of your suggested changes. Feel free to have a final look, if you'd like. I'm aiming to merge in the next 2 days to avoid conflicts and get this in before 24.06 burndown begins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bdice! I think you will need to solicit another C++ review, but this looks great from my end.
/merge |
Thanks @davidwendt @wence- @PointKernel for the reviews! |
Description
Addresses most of #14527. See also #14494.
This PR expands the use of
cudf::column_types_equal(lhs, rhs)
and adds new methodscudf::column_scalar_types_equal
,cudf::scalar_types_equal
, andcudf::all_column_types_equal
.These type check functions are now employed throughout the code base instead of raw checks like
a.type() == b.type()
because those do not correctly handle nested types.Checklist