-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Release hash aggregation memory on output #25879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release hash aggregation memory on output #25879
Conversation
9c34efb
to
8676e2f
Compare
a9c503f
to
547ef13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces incremental memory release for hash aggregations by freeing portions of the hash table as output is produced. Key changes include:
- New test (testReleaseMemoryOnOutput) verifying memory reduction and restricted operations after output release.
- Implementation of startReleasingOutput in FlatHash and FlatGroupByHash, which updates internal state to release memory.
- Adjustments in InMemoryHashAggregationBuilder and other GroupByHash implementations to support and integrate the new memory release behavior.
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
core/trino-main/src/test/java/io/trino/operator/TestGroupByHash.java | Added tests for incremental memory release during output production. |
core/trino-main/src/test/java/io/trino/operator/CyclingGroupByHash.java | Introduced a stub for startReleasingOutput throwing an UnsupportedOperationException. |
core/trino-main/src/main/java/io/trino/operator/aggregation/builder/InMemoryHashAggregationBuilder.java | Updated aggregation builder to utilize incremental memory release. |
core/trino-main/src/main/java/io/trino/operator/FlatHash.java | Modified memory handling by nullifying arrays to free memory and added extra checks for the releasing state. |
core/trino-main/src/main/java/io/trino/operator/FlatGroupByHash.java | Updated to delegate startReleasingOutput to FlatHash. |
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java | Implemented a no-operation for startReleasingOutput. |
core/trino-main/src/main/java/io/trino/operator/AppendOnlyVariableWidthData.java | Enhanced chunk-freeing logic with safe null checks. |
core/trino-main/src/main/java/io/trino/operator/GroupByHash.java | Updated interface to include startReleasingOutput. |
Comments suppressed due to low confidence (2)
core/trino-main/src/test/java/io/trino/operator/CyclingGroupByHash.java:58
- Consider adding a comment explaining that startReleasingOutput is not supported in CyclingGroupByHash and will be implemented later, so that readers understand the rationale behind the UnsupportedOperationException.
public void startReleasingOutput()
core/trino-main/src/main/java/io/trino/operator/GroupByHash.java:112
- Consider adding Javadoc for the startReleasingOutput method to clarify the expected behavior and usage for implementations, helping future maintainers understand the contract behind memory release.
void startReleasingOutput();
547ef13
to
50c4a9a
Compare
cb7aa29
to
18cb110
Compare
@@ -248,12 +248,18 @@ public WorkProcessor<Page> buildResult() | |||
for (GroupedAggregator groupedAggregator : groupedAggregators) { | |||
groupedAggregator.prepareFinal(); | |||
} | |||
return buildResult(consecutiveGroupIds(), new PageBuilder(buildTypes()), false); | |||
// Only incrementally release memory for final aggregations, since partial aggregations have a fixed | |||
// memory limit and can be expected to fully flush and release their output quickly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter in practice? Without this condition the code could be simplified and startReleasingOutput
could be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think you need
startReleasingOutput()
no matter what since there are structures in theGroupByHash
that can be released immediately before anygroupId
is emitted. - Avoiding releasing memory in partial aggregations can help avoid lock contention in the memory tracking system for high parallelism operations (partial aggregations are likely to appear on top of table scans). It may not matter much in practice but it seemed like incrementally releasing there was unlikely to be helpful since we're flushing anyway and the memory released is expected to be negligible
core/trino-main/src/main/java/io/trino/operator/AppendOnlyVariableWidthData.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/AppendOnlyVariableWidthData.java
Show resolved
Hide resolved
18cb110
to
80a9797
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Would be good if @dain review too
core/trino-main/src/main/java/io/trino/operator/AppendOnlyVariableWidthData.java
Show resolved
Hide resolved
Incrementally releases memory from FlatGroupByHash when HashAggregationOperator starts producing output.
80a9797
to
78e8071
Compare
Description
Incrementally releases memory from
FlatGroupByHash
whenHashAggregationOperator
starts producing output. Previously, the entire hash table contents were kept in memory until output data was produced from hash aggregations.Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: