Closed
Description
Is your feature request related to a problem or challenge?
The inner loop of many accumulators, such as Avg
, is implemented in NullState::accumulate (source link), which has specializations for nulls and filters.
I think it would be possible to help the compiler make even more efficient code / vectorize the inner loops of the accumulators.
Describe the solution you'd like
Study and optimize the implementation of NullState::accumulate
and related functions and make them faster
Specifically, I think being clever about null mask iteration or updating null state more efficiently, we could perhaps make the aggregates faster
Benchmarks to drive some of this work:
benchmarks/bench.sh run tpch_mem
benchmarks/bench.sh run clickbench_1
Describe alternatives you've considered
Here is one approach (that didn't seem to make things better enough): #6954
Additional context
No response