-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Labels
PROPOSAL EPICA proposal being discussed that is not yet fully underwayA proposal being discussed that is not yet fully underwayenhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
Aggregation is a key operation of Analytic engines. DataFusion has made great progress recently (e.g. #4973 and #6889)
This Epic gathers other potential ways we can improve the performance of aggregation
Core Hash Grouping Algorithm:
- Improve aggregate performance by special casing single group keys #6969
- Improve aggregate performance with specialized groups accumulator for single string group by #7064
- Improve performance for grouping by variable length columns (strings) #9403
- Improved performance for streaming group by #7023
- Evaluate vectorized hash table for group aggregation #7095
Specialized Aggregators:
- Implement fast min/max accumulator for binary / strings (now it uses the slower path) #6906
- Improve the performance of COUNT DISTINCT queries for high cardinality groups #5547
- Speed up
DistinctCountAccumulator
#5472 - Improve aggregate performance with adaptive sizing in accumulators / avoiding reallocations in accumulators #7065
- Improve grouping performance via better vectorization in accumulate functions #7066
New features:
- Improve Memory usage + performance with large numbers of groups / High Cardinality Aggregates #6937
- Generate GroupByHash output in multiple
RecordBatch
es rather than one large one #9562 - Better Grouping / aggregation pushdown #8699
- Change
Accumulator::evaluate
andAccumulator::state
to take&mut self
#8934
Improved partitioning:
- Lock free MPSC channel for RepartitionExec #6928
- Improve RepartitionExec for better query performance #7001
- Speed up hash partitioning #6822
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
PROPOSAL EPICA proposal being discussed that is not yet fully underwayA proposal being discussed that is not yet fully underwayenhancementNew feature or requestNew feature or request