Skip to content

Conversation

@kaushalmahi12
Copy link
Contributor

@kaushalmahi12 kaushalmahi12 commented Aug 13, 2025

Description

This change addresses the OOM issue which is triggered when the coordinator node buffers the batched_reduce_size number of shard level results. In the current logic the circuit breaking logic is missing and can easily cause OOMs for memory intensive queries.
This change at a high level does the following

  • Check the circuit breaker on each shard level result arrivals.
  • Discard the new shard results post cancellation. Currently the request is only cancellable on partial or final reduce hence even if any resiliency mechanism cancels the task the co-ordinator node continue to process the shard level results until it receives batched_reduce_size results.
  • Cancels the co-ordinator task and its child tasks as soon as the circuit breaker trips.

Current search request flow concerning this PR

RequestCircuitBreaker

Related Issues

Resolves #18999

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@kaushalmahi12 kaushalmahi12 requested a review from a team as a code owner August 13, 2025 21:20
@github-actions github-actions bot added bug Something isn't working Search Search query, autocomplete ...etc labels Aug 13, 2025
…rdinator node

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
@github-actions
Copy link
Contributor

✅ Gradle check result for b0ed156: SUCCESS

@codecov
Copy link

codecov bot commented Aug 13, 2025

Codecov Report

❌ Patch coverage is 93.54839% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.85%. Comparing base (dc70bf6) to head (dcfe98c).
⚠️ Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
...search/action/search/QueryPhaseResultConsumer.java 96.15% 1 Missing ⚠️
...pensearch/action/search/TransportSearchAction.java 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #19066      +/-   ##
============================================
- Coverage     72.87%   72.85%   -0.02%     
+ Complexity    69380    69359      -21     
============================================
  Files          5647     5647              
  Lines        319084   319111      +27     
  Branches      46157    46159       +2     
============================================
- Hits         232528   232491      -37     
- Misses        67729    67753      +24     
- Partials      18827    18867      +40     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
@github-actions
Copy link
Contributor

❌ Gradle check result for 75e731d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
@github-actions
Copy link
Contributor

❌ Gradle check result for a73bbd3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for a73bbd3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
@github-actions
Copy link
Contributor

❌ Gradle check result for 83eba80: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
@github-actions
Copy link
Contributor

❌ Gradle check result for 927b6c8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
@github-actions
Copy link
Contributor

❌ Gradle check result for c96e172: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@jainankitk
Copy link
Contributor

Test is still failing:

> Task :server:test

Tests with failures:
 - org.opensearch.action.search.QueryPhaseResultConsumerTests.testCircuitBreakerTriggersBeforeBatchedReduce

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
@github-actions
Copy link
Contributor

✅ Gradle check result for dcfe98c: SUCCESS

@jainankitk jainankitk merged commit 10ff9d3 into opensearch-project:main Aug 15, 2025
31 checks passed
RajatGupta02 pushed a commit to RajatGupta02/OpenSearch that referenced this pull request Aug 18, 2025
…t#19066)

---------

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
atris pushed a commit to atris/OpenSearch that referenced this pull request Aug 28, 2025
…t#19066)

---------

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
kh3ra pushed a commit to kh3ra/OpenSearch that referenced this pull request Sep 5, 2025
…t#19066)

---------

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
vinaykpud pushed a commit to vinaykpud/OpenSearch that referenced this pull request Sep 26, 2025
…t#19066)

---------

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
kaushalmahi12 added a commit to kaushalmahi12/OpenSearch that referenced this pull request Oct 27, 2025
…t#19066)

---------

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Search Search query, autocomplete ...etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Search against index_pattern/alias with large number of shards results in OOM at co-ordinator node.

2 participants