Skip to content

Change Accumulator::evaluate and Accumulator::state to take &mut self #8934

Closed
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

As part of #8849, I wanted to build up the output state (the distinct string values) during accumulation and then generate output directly (without a copy)

However, I couldn't implement a zero copy algorithm because evaluate and merge take a &self not a &mut self (well I worked around it with a Mutex 🤮 ).

The need to copy intermediate state likely doesn't matter for most Accumulators as they only emit a single scalar value where the cost of copying is pretty low. However, for ones that emit significant internal state (like count DISTINCT) using the same internal state can save a lot of copying (again, see #8849 for an example)

Also, the actual Accumulator instances are never used after a call to evaluate/state, so the state of the accumulator after this call is never used again.

Describe the solution you'd like

I would like to change Accumulator::evaluate and Accumulator::state to take &mut self

This is also consistent with GroupsAccumulator which takes mut for its state and evaluate functions:

pub trait GroupsAccumulator: Send {
...
    fn evaluate(
        &mut self,
        emit_to: EmitTo
    ) -> Result<Arc<dyn Array>, DataFusionError>;
    fn state(
        &mut self,
        emit_to: EmitTo
    ) -> Result<Vec<Arc<dyn Array>>, DataFusionError>;
...

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions