Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The improved grouping algorithm on #790 improves grouping performance in general for DataFusion and is also general in that it works for all types of keys.
However, @sundy-li noted on #790 (comment) that additional performance is likely possible by special casing "small" and fixed sized keys.
Describe the solution you'd like
From @sundy-li ' comment:
Introduce the variant hash methods would help in this case.
E.G:
Query which group by 3 columns, which are [u8, u8, u16], a fixed hash key U32 will be enough.
- We can allocate one large fixed memory than multiple vec allocate.
- The fixed key saves the hash map memory size.
Alternate Ideas
@Dandandan also suggests that for small ranges / data types we can even avoid using a hash table and move to direct indexing instead. That might be interesting for u8 values or small dictionaries.