Description
Is your feature request related to a problem or challenge?
As part of #8849, I wanted to build up the output state (the distinct string values) during accumulation and then generate output directly (without a copy)
However, I couldn't implement a zero copy algorithm because evaluate
and merge
take a &self
not a &mut self
(well I worked around it with a Mutex
🤮 ).
The need to copy intermediate state likely doesn't matter for most Accumulators
as they only emit a single scalar value where the cost of copying is pretty low. However, for ones that emit significant internal state (like count DISTINCT
) using the same internal state can save a lot of copying (again, see #8849 for an example)
Also, the actual Accumulator
instances are never used after a call to evaluate/state, so the state of the accumulator after this call is never used again.
Describe the solution you'd like
I would like to change Accumulator::evaluate
and Accumulator::state
to take &mut self
This is also consistent with GroupsAccumulator
which takes mut for its state and evaluate functions:
pub trait GroupsAccumulator: Send {
...
fn evaluate(
&mut self,
emit_to: EmitTo
) -> Result<Arc<dyn Array>, DataFusionError>;
fn state(
&mut self,
emit_to: EmitTo
) -> Result<Vec<Arc<dyn Array>>, DataFusionError>;
...
Describe alternatives you've considered
No response
Additional context
No response