-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorized grouping by multiple column #7754
Conversation
Paste the values together and use the umash hash of the resulting value for grouping, like we do for the text keys.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7754 +/- ##
==========================================
+ Coverage 80.06% 81.94% +1.88%
==========================================
Files 190 249 +59
Lines 37181 45975 +8794
Branches 9450 11516 +2066
==========================================
+ Hits 29770 37676 +7906
- Misses 2997 3770 +773
- Partials 4414 4529 +115 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@@ -225,7 +227,7 @@ add_one_range(GroupingPolicyHash *policy, TupleTableSlot *vector_slot, const int | |||
* Remember which aggregation states have already existed, and which we | |||
* have to initialize. State index zero is invalid. | |||
*/ | |||
const uint32 last_initialized_key_index = policy->last_used_key_index; | |||
const uint32 last_initialized_key_index = policy->hashing.last_used_key_index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some mechanical refactoring to move more hashing-related data into the hashing strategy struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a test case specifically for NULLs. A table with two integer columns would do, and two tuples with (1, NULL) and (NULL, 1) ?
|
||
if (column_values->decompression_type == DT_Scalar) | ||
{ | ||
if (!*column_values->output_isnull) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this. Are we hashing the two tuples of (NULL, "A") and ("A", NULL) to the same hash code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there's a null bitmap at the beginning, which will be different. I think if we can deserialize the key unambiguously, it means we don't confuse such cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deserialization code is in serialized_emit_key
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had some tests with nulls but I also added one like you asked, for clarity.
{ | ||
const bool is_valid = !*column_values->output_isnull; | ||
byte_bitmap_set_row_validity(serialized_key_validity_bitmap, column_index, is_valid); | ||
if (is_valid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question about hashing NULLs here. I'm not sure if it is valid to just ignore NULLs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I downloaded the code and I tested, NULLs are not an issue
The continuous aggregate incremental refresh test accidentally used the now() function which makes it fail. Replace it with fixed dates.
Automated backport to 2.19.x not done: backport failed. The PR touches a workflow file '.github/workflows/windows-build-and-test.yaml' and cannot be backported automatically |
Paste the values together and use the umash hash of the resulting value for grouping, like we do for the text keys.
Paste the values together and use the umash hash of the resulting value for grouping, like we do for the text keys.
Paste the values together and use the umash hash of the resulting value for grouping, like we do for the text keys.
Up to 150% improvement on some tsbench queries.