Add TopN Query Aggregator Memory Guardrails #17439

jtuglu-netflix · 2024-10-31T00:51:46Z

Adds a way to monitor aggregator map entry increases for TopN queries. This is in response to a problem we've seen in queries where a topN query is done with an expensive aggregator on a high-cardinality dimension. I've reproduced this problem locally, and confirmed that this fixes the issue in most cases. The other cases are outlined in Drawbacks.

Description

Approach

Use a configurable (per-query, per-segment) fixed size byte amount that aggregators can take up. The byte count is maintained at a segment-level during each query runner's pass through a segment (getting result sequence). I opted for this instead of doing a %-of-heap based approach as in high-traffic scenarios, there could be multiple queries racing to allocate some memory for aggregators, and these could all read say, 5% of total available heap (let's say this is permissible % to allocate). If we're already at 80-90%, this could result poorly. Instead, using a fixed amount is a bit more cumbersome, but at least guarantees a realistic upper-bound on how much memory N concurrent queries could theoretically use. Changing to either approach (or another) is easy enough. I found the latter performed more consistently in local testing with artificially low heap sizes and with parallel queries. Another alternative I was thinking was a shared buffer that queries can "borrow" from for doing their queries, where this pool would be shared amongst all running queries. This is a bit like what GroupBy does.

Drawbacks

No spilling to disk. This will have to come in a separate PR, ideally sharing some of the code in groupby.
No knowledge of # of parallel queries. GroupBy is cognizant of this with processing.numMergeBuffers. I could use this as an "automated" way of determining the available buffer size instead of hardcoding a default constant, for example: some % of maxJVMHeap / # max parallel queries

Release note

Key changed/added classes in this PR

extensions-contrib/distinctcount/src/test/java/org/apache/druid/query/aggregation/distinctcount/DistinctCountTopNQueryTest.java
processing/src/main/java/org/apache/druid/query/topn/BaseTopNAlgorithm.java
processing/src/main/java/org/apache/druid/query/topn/PooledTopNAlgorithm.java
processing/src/main/java/org/apache/druid/query/topn/TopNAggregatorResourceHelper.java
processing/src/main/java/org/apache/druid/query/topn/TopNQuery.java
processing/src/main/java/org/apache/druid/query/topn/TopNQueryEngine.java
processing/src/main/java/org/apache/druid/query/topn/TopNQueryQueryToolChest.java
processing/src/main/java/org/apache/druid/query/topn/TopNQueryRunnerFactory.java
processing/src/test/java/org/apache/druid/query/topn/TopNQueryRunnerFailureTest.java
processing/src/test/java/org/apache/druid/segment/CursorHolderPreaggTest.java
processing/src/test/java/org/apache/druid/segment/incremental/IncrementalIndexCursorFactoryTest.java

This PR has:

processing/src/main/java/org/apache/druid/query/topn/TopNQuery.java

processing/src/test/java/org/apache/druid/query/topn/TopNQueryRunnerFailureTest.java

+
+
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();


processing/src/test/java/org/apache/druid/query/topn/TopNQueryRunnerFailureTest.java

+      boolean specializeHistorical1SimpleDoubleAggPooledTopN,
+      boolean specializeHistoricalSingleValueDimSelector1SimpleDoubleAggPooledTopN,
+      List<AggregatorFactory> commonAggregators,
+      String testName


samarthjain · 2024-11-13T21:52:10Z

Use a configurable (per-query, per-segment) fixed size byte amount that aggregators can take up.

This part of PR description confused me a little bit because the configuration is really at a query level and not at a segment level (which would be a bit hard to set the right limit for).

processing/src/main/java/org/apache/druid/query/topn/BaseTopNAlgorithm.java

processing/src/main/java/org/apache/druid/query/topn/TopNQuery.java

jtuglu-netflix · 2024-11-13T21:59:20Z

Use a configurable (per-query, per-segment) fixed size byte amount that aggregators can take up.

This part of PR description confused me a little bit because the configuration is really at a query level and not at a segment level (which would be a bit hard to set the right limit for).

Since this is initialized per-runner (which can be across different segments on different historicals, etc.), this is technically unique per (query-id, segment-id), or more generally per SpecificSegmentQueryRunner.

jtuglu-netflix changed the title ~~Add TopN query guardrails~~ Add TopN Query Aggregator Memory Guardrails Oct 31, 2024

jtuglu-netflix force-pushed the add-topn-memory-guardrails-master-head branch 2 times, most recently from 8e397ee to b94ba51 Compare October 31, 2024 22:38

jtuglu-netflix marked this pull request as ready for review October 31, 2024 22:39

maytasm requested a review from abhishekagarwal87 November 1, 2024 03:21

maytasm added Bug Area - Querying labels Nov 1, 2024

maytasm requested a review from samarthjain November 1, 2024 03:22

jtuglu-netflix force-pushed the add-topn-memory-guardrails-master-head branch from b94ba51 to dc7d238 Compare November 1, 2024 04:41

github-actions bot added the Area - Documentation label Nov 1, 2024

github-advanced-security bot found potential problems Nov 1, 2024

View reviewed changes

add topn query guardrails

dd2cab0

jtuglu-netflix force-pushed the add-topn-memory-guardrails-master-head branch from dc7d238 to dd2cab0 Compare November 6, 2024 23:43

fix formatting

350af17

clintropolis added the Design Review label Nov 9, 2024

samarthjain reviewed Nov 13, 2024

View reviewed changes

address pr comments

aaaf679

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TopN Query Aggregator Memory Guardrails #17439

Add TopN Query Aggregator Memory Guardrails #17439

jtuglu-netflix commented Oct 31, 2024 •

edited

Loading

samarthjain commented Nov 13, 2024

jtuglu-netflix commented Nov 13, 2024 •

edited

Loading



		@Rule
		public ExpectedException expectedException = ExpectedException.none();

Add TopN Query Aggregator Memory Guardrails #17439

Are you sure you want to change the base?

Add TopN Query Aggregator Memory Guardrails #17439

Conversation

jtuglu-netflix commented Oct 31, 2024 • edited Loading

Description

Approach

Drawbacks

Release note

Key changed/added classes in this PR

samarthjain commented Nov 13, 2024

jtuglu-netflix commented Nov 13, 2024 • edited Loading

jtuglu-netflix commented Oct 31, 2024 •

edited

Loading

jtuglu-netflix commented Nov 13, 2024 •

edited

Loading