Closed as duplicate
Description
Updating the hash aggregate implementation to use vectorized hashing should give a decent speed up to queries that are dependant on fast hash aggregate implementations.
Currently keys are generated of type Vec<u8>
and are hashed row-by-row which causes
- more memory usage
- slow re-hashing of the backing hashmap
- type un-aware hashing for simple primitive values
The implementation should also solve hash collisions, so the original should be able to be compared with the values.
There is some WIP code here apache/arrow#9213 which can be used as a starting point / to continue from.
Metadata
Metadata
Assignees
Labels
No labels