Skip to content

[EPIC] Improved Externalized / Spilling / Large than Memory Hash Aggregation #13123

Open
@alamb

Description

@alamb

This is a collection of items to improve external (spilling) aggregation

Background

Abstract—Analytical database systems offer high-performance in-memory aggregation. If there are many unique groups, temporary query intermediates may not fit RAM, requiring the use of external storage. However, switching from an in-memory to an external algorithm can degrade performance sharply

DataFusion has supported memory limited / spilling hash aggregation since @kazuyukitanimura added it last year in #7400.

We can likely improve this feature and @2010YOUY01 is considering working on it

Tasks the solution you'd like

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions