Skip to content

Release hash aggregation memory on output #25879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

pettyjamesm
Copy link
Member

@pettyjamesm pettyjamesm commented May 28, 2025

Description

Incrementally releases memory from FlatGroupByHash when HashAggregationOperator starts producing output. Previously, the entire hash table contents were kept in memory until output data was produced from hash aggregations.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## General
* Reduce memory usage of aggregations by incrementally releasing memory as output rows are are produced ({issue}`25879`)

@pettyjamesm pettyjamesm requested a review from dain May 28, 2025 15:15
@cla-bot cla-bot bot added the cla-signed label May 28, 2025
@pettyjamesm pettyjamesm force-pushed the release-hash-aggregation-memory branch 6 times, most recently from 9c34efb to 8676e2f Compare May 28, 2025 20:54
@pettyjamesm pettyjamesm marked this pull request as ready for review May 28, 2025 23:35
@pettyjamesm pettyjamesm force-pushed the release-hash-aggregation-memory branch 2 times, most recently from a9c503f to 547ef13 Compare May 29, 2025 14:17
@pettyjamesm pettyjamesm requested a review from sopel39 June 4, 2025 18:59
@sopel39 sopel39 requested a review from Copilot June 5, 2025 12:35
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces incremental memory release for hash aggregations by freeing portions of the hash table as output is produced. Key changes include:

  • New test (testReleaseMemoryOnOutput) verifying memory reduction and restricted operations after output release.
  • Implementation of startReleasingOutput in FlatHash and FlatGroupByHash, which updates internal state to release memory.
  • Adjustments in InMemoryHashAggregationBuilder and other GroupByHash implementations to support and integrate the new memory release behavior.

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
core/trino-main/src/test/java/io/trino/operator/TestGroupByHash.java Added tests for incremental memory release during output production.
core/trino-main/src/test/java/io/trino/operator/CyclingGroupByHash.java Introduced a stub for startReleasingOutput throwing an UnsupportedOperationException.
core/trino-main/src/main/java/io/trino/operator/aggregation/builder/InMemoryHashAggregationBuilder.java Updated aggregation builder to utilize incremental memory release.
core/trino-main/src/main/java/io/trino/operator/FlatHash.java Modified memory handling by nullifying arrays to free memory and added extra checks for the releasing state.
core/trino-main/src/main/java/io/trino/operator/FlatGroupByHash.java Updated to delegate startReleasingOutput to FlatHash.
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java Implemented a no-operation for startReleasingOutput.
core/trino-main/src/main/java/io/trino/operator/AppendOnlyVariableWidthData.java Enhanced chunk-freeing logic with safe null checks.
core/trino-main/src/main/java/io/trino/operator/GroupByHash.java Updated interface to include startReleasingOutput.
Comments suppressed due to low confidence (2)

core/trino-main/src/test/java/io/trino/operator/CyclingGroupByHash.java:58

  • Consider adding a comment explaining that startReleasingOutput is not supported in CyclingGroupByHash and will be implemented later, so that readers understand the rationale behind the UnsupportedOperationException.
public void startReleasingOutput()

core/trino-main/src/main/java/io/trino/operator/GroupByHash.java:112

  • Consider adding Javadoc for the startReleasingOutput method to clarify the expected behavior and usage for implementations, helping future maintainers understand the contract behind memory release.
void startReleasingOutput();

@pettyjamesm pettyjamesm force-pushed the release-hash-aggregation-memory branch from 547ef13 to 50c4a9a Compare June 5, 2025 14:03
@pettyjamesm pettyjamesm force-pushed the release-hash-aggregation-memory branch 3 times, most recently from cb7aa29 to 18cb110 Compare June 6, 2025 15:25
@raunaqmorarka raunaqmorarka requested a review from lukasz-stec June 9, 2025 13:51
@@ -248,12 +248,18 @@ public WorkProcessor<Page> buildResult()
for (GroupedAggregator groupedAggregator : groupedAggregators) {
groupedAggregator.prepareFinal();
}
return buildResult(consecutiveGroupIds(), new PageBuilder(buildTypes()), false);
// Only incrementally release memory for final aggregations, since partial aggregations have a fixed
// memory limit and can be expected to fully flush and release their output quickly
Copy link
Member

@sopel39 sopel39 Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter in practice? Without this condition the code could be simplified and startReleasingOutput could be removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think you need startReleasingOutput() no matter what since there are structures in the GroupByHash that can be released immediately before any groupId is emitted.
  2. Avoiding releasing memory in partial aggregations can help avoid lock contention in the memory tracking system for high parallelism operations (partial aggregations are likely to appear on top of table scans). It may not matter much in practice but it seemed like incrementally releasing there was unlikely to be helpful since we're flushing anyway and the memory released is expected to be negligible

@pettyjamesm pettyjamesm force-pushed the release-hash-aggregation-memory branch from 18cb110 to 80a9797 Compare June 16, 2025 17:58
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Would be good if @dain review too

@pettyjamesm pettyjamesm force-pushed the release-hash-aggregation-memory branch from 80a9797 to 78e8071 Compare June 17, 2025 15:38
@pettyjamesm pettyjamesm merged commit 5d41f5d into trinodb:master Jun 18, 2025
96 checks passed
@pettyjamesm pettyjamesm deleted the release-hash-aggregation-memory branch June 18, 2025 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants