Skip to content

Replace record_batch.get_array_memory_size() in spilling operators #13430

@2010YOUY01

Description

@2010YOUY01

Is your feature request related to a problem or challenge?

This issue is trying to track the follow on tasks for #13377

  1. Currently record batches' memory size will be overestimated, and this issue will be (partly) fixed by Fix record batch memory size double counting #13377 After merge, other usage of record_batch.get_array_memory_size() should be replaced (e.g. Memory counting in TopK and Sort-Merge-Join) After that, more end-to-end tests for that specific operator can be added
  2. Check whether known memory-limited query related bug can be fixed by '1'. Issues maybe related are:
    External aggregation reserves more memory than actual usage #13089
    Further refine the Top K sort operator #9417
    Excessive memory consumption on sorting #10511
    External sorting not working for (maybe only for string columns??) #12136
    Some memory reservations of GroupedHashAggregateStream seem to be mis-tagged as spillable while they do not allow spilling #11390

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions