Specialize single column primitive group values#7043
Conversation
| } | ||
|
|
||
| /// A [`GroupValues`] making use of [`Rows`] | ||
| struct GroupValuesRows { |
There was a problem hiding this comment.
This is moved into a new module
| let hash = key.hash(state); | ||
| let insert = self.map.find_or_find_insert_slot( | ||
| hash, | ||
| |g| unsafe { self.values.get_unchecked(*g).is_eq(key) }, |
| Ok(v) => *v.as_ref(), | ||
| Err(slot) => { | ||
| let g = self.values.len(); | ||
| self.map.insert_in_slot(hash, slot, g); |
There was a problem hiding this comment.
I think this should still track the allocated memory (like insert_accounted did?)
There was a problem hiding this comment.
It is accounted in the size method
|
I just pushed a fix for the CI failure. I am now running benchmarks on this branch to confirm. I expect this change may make a substantial difference on some of the ClickBench results as well, but I don't have them automated quite yet |
|
My benchmark results are similar. I plan to review this PR carefully tomorrow morning FYI @yahoNanJing |
| }; | ||
| } | ||
|
|
||
| // TODO: More primitives |
alamb
left a comment
There was a problem hiding this comment.
Thank you @tustvold -- this PR is awesome ❤️ I will rerun the clickbench numbers with this PR once it is merged.
I left a bunch of "improve the comment" type suggestions which I think would add value but are optional and can be done as a follow on PR (I will do them if you choose not to).
I do think we should add hash collision testing prior to merge, which I left a comment about
| /// Stores the group index based on the hash of its value | ||
| map: RawTable<usize>, | ||
| /// The group index of the null value if any | ||
| null_group: Option<usize>, |
| } | ||
| EmitTo::First(n) => { | ||
| // SAFETY: self.map outlives iterator and is not modified concurrently | ||
| unsafe { |
There was a problem hiding this comment.
I think this code is largely replicated from the row version. I wonder if it could be refactored into a (templated) common function (with appropriate documentation)?
There was a problem hiding this comment.
There isn't an easy way to make this generic, as one stores tuples and one isn't... I at least can't see a way that doesn't just obfuscate the code
| use hashbrown::raw::RawTable; | ||
| use std::sync::Arc; | ||
|
|
||
| /// A trait to allow hashing of floating point numbers |
There was a problem hiding this comment.
Can you explain why this doesn't use create_hashes? And perhaps add comments in the code about the rationale?
If it is important not to use create_hashes I recommend
- Move this code to
hash_utils.rsso it is easier to find - Implement the
force_hash_collisionsversion (that always hashes things to 0) to ensure appropriate coverage
Here is an example of force_hash_collisions
https://github.com/apache/arrow-datafusion/blob/368f6e606a3cfca8e04638b8d5ff0ff116a20b57/datafusion/physical-expr/src/hash_utils.rs#L214-L224
There was a problem hiding this comment.
We can't use create_hashes as we are generating the hashes from the native values, not an array
| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| use crate::physical_plan::aggregates::group_values::GroupValues; |
There was a problem hiding this comment.
this code is just moved, right?
|
|
||
| hash_float!(f16, f32, f64); | ||
|
|
||
| /// A [`GroupValues`] storing raw primitive values |
There was a problem hiding this comment.
| /// A [`GroupValues`] storing raw primitive values | |
| /// A [`GroupValues`] storing a single column of raw primitive values | |
| /// | |
| /// This specialization is significantly faster than using the more general | |
| /// purpose `Row`s format |
|
FYI @yahoNanJing -- I think this PR will make it even easier to evaluate the improvements that a fixed width arrow row format would provide (We can make a specialized GroupsValue under the correct circumstances) |
|
I saw some strange results when running with this branch on my clickbench testing. Please standby |
|
TLDR is I think this PR is good to go from a performance perspective. I don't see massive gains but I do see small improvements |
Which issue does this PR close?
Part of #6969
Rationale for this change
The slower tests appear to just be noise, at least they don't seem to reproduce consistently
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?