Skip to content

Conversation

@peteralfonsi
Copy link
Contributor

@peteralfonsi peteralfonsi commented Mar 21, 2025

Description

This PR automatically rewrites boolean queries which have a must_not RangeQuery clause to instead use a should clause of the complement of that range. This can be 2-30x faster depending on the query. See #17586 where this is described in more detail.

Example original query (on nyc_taxis):

"bool" : { 
  "must_not": [ 
    {"range" : {"dropoff_datetime" : {"gte": "01/07/2015", "lte": "01/09/2015", "format": "dd/MM/yyyy"}}}
  ]
}

Rewritten query:

"bool": { 
  "must":{
    "bool":{
      "should": [
        {"range" : {"dropoff_datetime" : {"lt": "01/07/2015", "format": "dd/MM/yyyy"}}},
        {"range" : {"dropoff_datetime" : {"gt": "01/09/2015", "format": "dd/MM/yyyy"}}}
      ]
    }
  }
}

Some benchmark numbers from http_logs and nyc_taxis (excluded ranges are on @timestamp and dropoff_datetime fields respectively). "Originally written as" means whether the query was sent to OpenSearch with a must_not clause, or if it was sent already rewritten with should clauses. Ideally, after the changes are applied, these p50s should be the same.

Excluded range Originally written as Dataset p50 before changes (ms) p50 after changes (ms)
6/10 - 6/13 must_not http_logs 259 38.2
6/10 - 6/13 should http_logs 34.2 39.5
6/9 - 6/10 must_not http_logs 269 30.8
6/9 - 6/10 should http_logs 26.3 30.8
7/1 - 9/1 must_not nyc_taxis 873 408
7/1 - 9/1 should nyc_taxis 427 405
1/1 - 9/1 must_not nyc_taxis 1214 111
1/1 - 9/1 should nyc_taxis 116 111
1/1 12:00 - 1/1 12:01 must_not nyc_taxis 599 19.5
1/1 12:00 - 1/1 12:01 should nyc_taxis 19.3 20.2

I believe the small differences between runs (for example, 7/1-9/1 should going from 427 -> 405 ms, when we'd expect no change) is just due to variation between different runs/instances. This is expected from what I've seen in tiered caching benchmarks. I've done a few runs and the direction/magnitude of the changes vary.

Related Issues

Part of #17586

Check List

  • Functionality includes testing.
  • [N/A] API changes companion pull request created, if applicable.
  • [N/A] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Peter Alfonsi added 2 commits March 20, 2025 09:55
Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Peter Alfonsi added 2 commits March 21, 2025 13:51
Signed-off-by: Peter Alfonsi <petealft@amazon.com>
@github-actions
Copy link
Contributor

❌ Gradle check result for d9eee10: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Peter Alfonsi <petealft@amazon.com>
@github-actions
Copy link
Contributor

✅ Gradle check result for 25367bb: SUCCESS

@peteralfonsi
Copy link
Contributor Author

Hey @msfroh , any further comments on this one?

@github-actions
Copy link
Contributor

❌ Gradle check result for c4d29f2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@peteralfonsi
Copy link
Contributor Author

Flaky test: #18302

Signed-off-by: Peter Alfonsi <petealft@amazon.com>
@github-actions
Copy link
Contributor

✅ Gradle check result for 58d9a45: SUCCESS

@peteralfonsi
Copy link
Contributor Author

Hey @msfroh , just bumping on this

@msfroh msfroh merged commit a6eb368 into opensearch-project:main Jun 4, 2025
30 checks passed
Gagan6164 pushed a commit to Gagan6164/OpenSearch that referenced this pull request Jun 8, 2025
…ect#17655)

---------

Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Signed-off-by: Peter Alfonsi <peter.alfonsi@gmail.com>
Co-authored-by: Peter Alfonsi <petealft@amazon.com>
Gagan6164 pushed a commit to Gagan6164/OpenSearch that referenced this pull request Jun 8, 2025
…ect#17655)

---------

Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Signed-off-by: Peter Alfonsi <peter.alfonsi@gmail.com>
Co-authored-by: Peter Alfonsi <petealft@amazon.com>
rgsriram pushed a commit to rgsriram/OpenSearch that referenced this pull request Jun 9, 2025
…ect#17655)

---------

Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Signed-off-by: Peter Alfonsi <peter.alfonsi@gmail.com>
Co-authored-by: Peter Alfonsi <petealft@amazon.com>
abhita pushed a commit to abhita/OpenSearch that referenced this pull request Jun 9, 2025
…ect#17655)

---------

Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Signed-off-by: Peter Alfonsi <peter.alfonsi@gmail.com>
Co-authored-by: Peter Alfonsi <petealft@amazon.com>
neuenfeldttj added a commit to neuenfeldttj/OpenSearch that referenced this pull request Jun 26, 2025
…ect#17655)

---------

Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Signed-off-by: Peter Alfonsi <peter.alfonsi@gmail.com>
Co-authored-by: Peter Alfonsi <petealft@amazon.com>Signed-off-by: TJ Neuenfeldt <tjneu@amazon.com>
neuenfeldttj pushed a commit to neuenfeldttj/OpenSearch that referenced this pull request Jun 26, 2025
…ect#17655)

---------

Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Signed-off-by: Peter Alfonsi <peter.alfonsi@gmail.com>
Co-authored-by: Peter Alfonsi <petealft@amazon.com>
tandonks pushed a commit to tandonks/OpenSearch that referenced this pull request Aug 5, 2025
…ect#17655)

---------

Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Signed-off-by: Peter Alfonsi <peter.alfonsi@gmail.com>
Co-authored-by: Peter Alfonsi <petealft@amazon.com>
atris added a commit to atris/OpenSearch that referenced this pull request Aug 18, 2025
…rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
rishabhmaurya pushed a commit that referenced this pull request Aug 27, 2025
* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update per CI

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update CHANGELOG

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR #18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs #17655 and #18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

---------

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
atris added a commit to atris/OpenSearch that referenced this pull request Aug 28, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update per CI

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update CHANGELOG

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

---------

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
pranikum pushed a commit to pranikum/OpenSearch that referenced this pull request Sep 4, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update per CI

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update CHANGELOG

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

---------

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
kh3ra pushed a commit to kh3ra/OpenSearch that referenced this pull request Sep 5, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update per CI

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update CHANGELOG

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

---------

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
jainankitk pushed a commit to jainankitk/OpenSearch that referenced this pull request Sep 22, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update per CI

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update CHANGELOG

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

---------

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
jainankitk pushed a commit to jainankitk/OpenSearch that referenced this pull request Sep 22, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update per CI

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update CHANGELOG

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

---------

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
Signed-off-by: Ankit Jain <jainankitk@apache.org>
jainankitk pushed a commit to jainankitk/OpenSearch that referenced this pull request Sep 22, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update per CI

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update CHANGELOG

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

---------

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
Signed-off-by: Ankit Jain <jainankitk@apache.org>
asimmahmood1 pushed a commit to jainankitk/OpenSearch that referenced this pull request Sep 23, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update per CI

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update CHANGELOG

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

---------

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
vinaykpud pushed a commit to vinaykpud/OpenSearch that referenced this pull request Sep 26, 2025
…arch-project#19060)

* Add query rewriting infrastructure to reduce query complexity

  Implements three query optimizations that work together:
  - Boolean flattening: removes unnecessary nested boolean queries
  - Terms merging: combines multiple term queries on same field in filter/should contexts
  - Match-all removal: eliminates redundant match_all queries

  Key features:
  - 60-70% reduction in query nodes for typical filtered queries
  - Feature flag: search.query_rewriting.enabled (default: true)
  - Preserves exact query semantics and results

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix forbidden api issues

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update writers and get tests to pass

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update per CI

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix term merging threshold and update comments

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Expose setting and update per comments

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Update CHANGELOG

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Fix tests and ensure scoring MATCH ALL query is preserved

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Migrate must to filter and must not to should optimizations to query rewriting infrastructure

  This commit migrates two existing query optimizations from BoolQueryBuilder to the new
  query rewriting infrastructure:

  1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match)
     from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541)

  2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for
     better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498)

  Changes:
   Add MustToFilterRewriter with priority 150 (runs after boolean flattening)
   Add MustNotToShouldRewriter with priority 175 (runs after must to filter)
   Register both rewriters in QueryRewriterRegistry
   Add comprehensive test suites (15 tests for must to filter, 14 for must not to should)
   Disable legacy implementations in BoolQueryBuilder
   Comment out BoolQueryBuilder tests that relied on the old implementations

  The new rewriters maintain full backward compatibility while providing:
   Better separation of concerns
   Recursive rewriting for nested boolean queries
   Proper error handling and logging
   Consistent priority based execution order

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

* Handle fields with missing fields

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>

---------

Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants