-
Couldn't load subscription status.
- Fork 2.3k
[DRAFT] Add streaming search with configurable scoring modes #19160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Introduces streaming search infrastructure that enables progressive emission of search results with three configurable scoring modes. The implementation extends the existing streaming transport layer to support partial result computation at the coordinator level. Scoring modes: - NO_SCORING: Immediate result emission without confidence requirements - CONFIDENCE_BASED: Statistical emission using Hoeffding inequality bounds - FULL_SCORING: Complete scoring before result emission The implementation leverages OpenSearch's inter-node streaming capabilities to reduce query latency through early result emission. Partial reductions are triggered based on the selected scoring mode, with results accumulated at the coordinator before final response generation. Key changes: - Add HoeffdingBounds for statistical confidence calculation - Extend QueryPhaseResultConsumer to support streaming reduction - Add StreamingScoringCollector wrapping TopScoreDocCollector - Integrate streaming scorer selection in QueryPhase - Add REST parameter stream_scoring_mode for mode selection - Include streaming metadata in SearchResponse The current implementation operates within architectural constraints where streaming is limited to inter-node communication. Client-facing streaming will be addressed in a follow-up contribution. Addresses opensearch-project#18725 Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
|
❌ Gradle check result for 4da03a9: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
* Fix flaky ExistsQueryBuilderTests.testToQuery Test was generating field patterns that matched raw.derived_keyword, which doesnt support exists queries. Fixed by replacing problematic patterns with TEXT_FIELD_NAME. Fixes opensearch-project#18724 Signed-off-by: Atri Sharma <atri.jiit@gmail.com> * Revert CHANGELOG changes Signed-off-by: Atri Sharma <atri.jiit@gmail.com> --------- Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
IndexStatsIT.testConcurrentIndexingAndStatsRequests sometimes fails with
the following NPE:
```
IndexStatsIT > testConcurrentIndexingAndStatsRequests {p0={"cluster.indices.replication.strategy":"SEGMENT"}} FAILED
java.lang.AssertionError:
Expected: an empty collection
but: <[[test][3] failed, reason [BroadcastShardOperationFailedException[operation indices:monitor/stats failed]; nested: NullPointerException[Cannot invoke "java.util.Map$Entry.getValue()" because "highestEntry" is null]; ]]>
at __randomizedtesting.SeedInfo.seed([B56C9D0503BD16AE:2FDF6CC817B0190]:0)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
at org.junit.Assert.assertThat(Assert.java:964)
at org.junit.Assert.assertThat(Assert.java:930)
at org.opensearch.indices.stats.IndexStatsIT.testConcurrentIndexingAndStatsRequests(IndexStatsIT.java:1451)
```
The `isEmpty()` check on the concurrent map is not sufficient because
entries can be removed after the check but before retrieving them.
Signed-off-by: Andrew Ross <andrross@amazon.com>
…search-project#18966) Implements TemporalRoutingProcessor for ingest pipelines and TemporalRoutingSearchProcessor for search pipelines based on RFC opensearch-project#18920. Features: - Route documents to shards based on timestamp fields - Support hour, day, week, and month granularities - Optional hash bucketing for better distribution - Automatic search routing to relevant time ranges - ISO week format support The processors enable efficient time-based data organization for log and metrics workloads by co-locating documents from the same time period on the same shards. --------- Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
* Bump actions/checkout from 4 to 5 Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v4...v5) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * Update changelog Signed-off-by: dependabot[bot] <support@github.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…-project#19001) Signed-off-by: Craig Perkins <cwperx@amazon.com> Signed-off-by: Andrew Ross <andrross@amazon.com> Co-authored-by: Craig Perkins <cwperx@amazon.com>
…ory-hdfs (opensearch-project#19021) * Bump commons-cli:commons-cli in /plugins/repository-hdfs Bumps [commons-cli:commons-cli](https://github.com/apache/commons-cli) from 1.9.0 to 1.10.0. - [Changelog](https://github.com/apache/commons-cli/blob/master/RELEASE-NOTES.txt) - [Commits](apache/commons-cli@rel/commons-cli-1.9.0...rel/commons-cli-1.10.0) --- updated-dependencies: - dependency-name: commons-cli:commons-cli dependency-version: 1.10.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * Updating SHAs Signed-off-by: dependabot[bot] <support@github.com> * Update changelog Signed-off-by: dependabot[bot] <support@github.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Craig Perkins <cwperx@amazon.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Craig Perkins <cwperx@amazon.com>
…ribution/packages (opensearch-project#19019) * Bump com.netflix.nebula.ospackage-base in /distribution/packages Bumps com.netflix.nebula.ospackage-base from 12.0.0 to 12.1.0. --- updated-dependencies: - dependency-name: com.netflix.nebula.ospackage-base dependency-version: 12.1.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * Update changelog Signed-off-by: dependabot[bot] <support@github.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Craig Perkins <cwperx@amazon.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Craig Perkins <cwperx@amazon.com>
…e drop (opensearch-project#18766) * fix skip_unavailable setting changing to default during node drop issue#13798 Signed-off-by: Aniket Modak <animodak@amazon.com> * fix skip_unavailable setting changing to default during node drop issue#13798 Signed-off-by: Aniket Modak <animodak@amazon.com> --------- Signed-off-by: Aniket Modak <animodak@amazon.com> Signed-off-by: Aniket Modak <52703847+animodak7@users.noreply.github.com> Co-authored-by: Aniket Modak <animodak@amazon.com>
…t#19083) Signed-off-by: Andriy Redko <drreta@gmail.com>
* Bump GCS SDK API to version 2.55.0 Signed-off-by: Andrey Pleskach <ples@aiven.io> * Migrate GCS repository plugin to use GCS SDK v 2.x Signed-off-by: Andrey Pleskach <ples@aiven.io> --------- Signed-off-by: Andrey Pleskach <ples@aiven.io> Signed-off-by: Craig Perkins <cwperx@amazon.com> Co-authored-by: Craig Perkins <cwperx@amazon.com>
…project#19119) Signed-off-by: Daniel Widdis <widdis@gmail.com>
…or querycache (opensearch-project#18351) * fix changelog conflicts Signed-off-by: kkewwei <kkewwei@163.com> * Add a dynamic setting to change skip_cache_factor and min_frequency for querycache Signed-off-by: kkewwei <kewei.11@bytedance.com> Signed-off-by: kkewwei <kkewwei@163.com> * change the setting name Signed-off-by: kkewwei <kewei.11@bytedance.com> Signed-off-by: kkewwei <kkewwei@163.com> * add volatile to variable Signed-off-by: kkewwei <kkewwei@163.com> Signed-off-by: kkewwei <kewei.11@bytedance.com> --------- Signed-off-by: kkewwei <kkewwei@163.com> Signed-off-by: kkewwei <kewei.11@bytedance.com> Signed-off-by: Andrew Ross <andrross@amazon.com> Co-authored-by: Andrew Ross <andrross@amazon.com>
…roject#19126) Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
…opensearch-project#19113) * Fix access specifier for FieldMapper method to allow usage by plugins Signed-off-by: Mohit Godwani <mgodwan@amazon.com> * Apply spotless Signed-off-by: Mohit Godwani <mgodwan@amazon.com> --------- Signed-off-by: Mohit Godwani <mgodwan@amazon.com>
…igurability to chose client via repo settings (opensearch-project#18800) Signed-off-by: Pranit Kumar <pranikum@amazon.com>
….38.0 in /plugins/repository-gcs (opensearch-project#19144) * Bump com.google.auth:google-auth-library-oauth2-http Bumps com.google.auth:google-auth-library-oauth2-http from 1.37.1 to 1.38.0. --- updated-dependencies: - dependency-name: com.google.auth:google-auth-library-oauth2-http dependency-version: 1.38.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * Updating SHAs Signed-off-by: dependabot[bot] <support@github.com> * Update changelog Signed-off-by: dependabot[bot] <support@github.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…dfs-fixture (opensearch-project#19146) * Bump com.squareup.okio:okio in /test/fixtures/hdfs-fixture Bumps [com.squareup.okio:okio](https://github.com/square/okio) from 3.15.0 to 3.16.0. - [Release notes](https://github.com/square/okio/releases) - [Changelog](https://github.com/square/okio/blob/master/CHANGELOG.md) - [Commits](square/okio@parent-3.15.0...parent-3.16.0) --- updated-dependencies: - dependency-name: com.squareup.okio:okio dependency-version: 3.16.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * Update changelog Signed-off-by: dependabot[bot] <support@github.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Craig Perkins <cwperx@amazon.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Craig Perkins <cwperx@amazon.com>
…rch-project#19151) Signed-off-by: Craig Perkins <cwperx@amazon.com>
…ns/repository-azure (opensearch-project#19145) * Bump com.azure:azure-storage-common in /plugins/repository-azure Bumps [com.azure:azure-storage-common](https://github.com/Azure/azure-sdk-for-java) from 12.30.1 to 12.30.2. - [Release notes](https://github.com/Azure/azure-sdk-for-java/releases) - [Commits](Azure/azure-sdk-for-java@azure-storage-blob_12.30.1...azure-storage-common_12.30.2) --- updated-dependencies: - dependency-name: com.azure:azure-storage-common dependency-version: 12.30.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * Updating SHAs Signed-off-by: dependabot[bot] <support@github.com> * Update changelog Signed-off-by: dependabot[bot] <support@github.com> * Upgrade com.azure:azure-storage-blob to 12.31.2 Signed-off-by: Craig Perkins <cwperx@amazon.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Craig Perkins <cwperx@amazon.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Andrey Pleskach <ples@aiven.io>
…arch-project#19060) * Add query rewriting infrastructure to reduce query complexity Implements three query optimizations that work together: - Boolean flattening: removes unnecessary nested boolean queries - Terms merging: combines multiple term queries on same field in filter/should contexts - Match-all removal: eliminates redundant match_all queries Key features: - 60-70% reduction in query nodes for typical filtered queries - Feature flag: search.query_rewriting.enabled (default: true) - Preserves exact query semantics and results Signed-off-by: Atri Sharma <atri.jiit@gmail.com> * Fix forbidden api issues Signed-off-by: Atri Sharma <atri.jiit@gmail.com> * Update writers and get tests to pass Signed-off-by: Atri Sharma <atri.jiit@gmail.com> * Update per CI Signed-off-by: Atri Sharma <atri.jiit@gmail.com> * Fix term merging threshold and update comments Signed-off-by: Atri Sharma <atri.jiit@gmail.com> * Expose setting and update per comments Signed-off-by: Atri Sharma <atri.jiit@gmail.com> * Update CHANGELOG Signed-off-by: Atri Sharma <atri.jiit@gmail.com> * Fix tests and ensure scoring MATCH ALL query is preserved Signed-off-by: Atri Sharma <atri.jiit@gmail.com> * Migrate must to filter and must not to should optimizations to query rewriting infrastructure This commit migrates two existing query optimizations from BoolQueryBuilder to the new query rewriting infrastructure: 1. **MustToFilterRewriter**: Moves non scoring queries (range, geo, numeric term/terms/match) from must to filter clauses to avoid unnecessary scoring calculations (from PR opensearch-project#18541) 2. **MustNotToShouldRewriter**: Transforms negative queries into positive complements for better performance on single valued numeric fields (from PRs opensearch-project#17655 and opensearch-project#18498) Changes: Add MustToFilterRewriter with priority 150 (runs after boolean flattening) Add MustNotToShouldRewriter with priority 175 (runs after must to filter) Register both rewriters in QueryRewriterRegistry Add comprehensive test suites (15 tests for must to filter, 14 for must not to should) Disable legacy implementations in BoolQueryBuilder Comment out BoolQueryBuilder tests that relied on the old implementations The new rewriters maintain full backward compatibility while providing: Better separation of concerns Recursive rewriting for nested boolean queries Proper error handling and logging Consistent priority based execution order Signed-off-by: Atri Sharma <atri.jiit@gmail.com> * Handle fields with missing fields Signed-off-by: Atri Sharma <atri.jiit@gmail.com> --------- Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
* Fix tika CVE Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com> * Update CHANGELOG.md Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com> * fix html parser Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com> * fix html parser Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com> * fix html parser Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com> * Add license Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com> * Add license Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com> * Update checksums Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com> * Update shas Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com> * Add pdf box license Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com> * Fix tests Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com> * Update security fonts permission Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com> * Add dummy fonts Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com> * Upstream fetch Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com> * Fix license check error Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com> * Fix license check error Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com> --------- Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com>
…ecated (opensearch-project#19154) * Replace centos:8 with almalinux:8 since centos docker images are deprecated Signed-off-by: Craig Perkins <cwperx@amazon.com> * Add to CHANGELOG Signed-off-by: Craig Perkins <cwperx@amazon.com> * Update Dockerfile Signed-off-by: Craig Perkins <cwperx@amazon.com> --------- Signed-off-by: Craig Perkins <cwperx@amazon.com>
…oject#19153) * Add query in QueryCollectorContextSpecFactory Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> * Add javadoc Signed-off-by: vibrantvarun <jainvarun4996@gmail.com> --------- Signed-off-by: vibrantvarun <jainvarun4996@gmail.com>
…nux:8 (opensearch-project#19159) Signed-off-by: Simon Marty <martysi@amazon.com>
* Add overload for channelFactory Signed-off-by: Rajat Gupta <gptrajat@amazon.com> * Fix tests Signed-off-by: Rajat Gupta <gptrajat@amazon.com> * Add Changelog entry Signed-off-by: Rajat Gupta <gptrajat@amazon.com> * Fix conflicts Signed-off-by: Rajat Gupta <gptrajat@amazon.com> * When update operations fail during preparation (e.g., version conflicts), (opensearch-project#18917) TransportShardBulkAction still triggers refresh even though no actual writes occurred. This fix checks if locationToSync is null (indicating no writes) and prevents refresh in such cases. Fixes opensearch-project#15261 Signed-off-by: Atri Sharma <atri.jiit@gmail.com> * Remove all entries from changelog to be released in 3.2 (opensearch-project#18989) Signed-off-by: Andrew Ross <andrross@amazon.com> * Add temporal routing processors for time-based document routing (opensearch-project#18966) Implements TemporalRoutingProcessor for ingest pipelines and TemporalRoutingSearchProcessor for search pipelines based on RFC opensearch-project#18920. Features: - Route documents to shards based on timestamp fields - Support hour, day, week, and month granularities - Optional hash bucketing for better distribution - Automatic search routing to relevant time ranges - ISO week format support The processors enable efficient time-based data organization for log and metrics workloads by co-locating documents from the same time period on the same shards. --------- Signed-off-by: Atri Sharma <atri.jiit@gmail.com> * Add CompletionStage variants to methods in the Client Interface and default to ActionListener impl (opensearch-project#18998) * Add CompletableFuture variables to methods in the Client Interface and default to ActionListener impl Signed-off-by: Craig Perkins <cwperx@amazon.com> * Add to CHANGELOG Signed-off-by: Craig Perkins <cwperx@amazon.com> * Fix typo in CHANGELOG Signed-off-by: Craig Perkins <cwperx@amazon.com> * Switch to CompletionStage Signed-off-by: Craig Perkins <cwperx@amazon.com> * Update CHANGELOG entry Signed-off-by: Craig Perkins <cwperx@amazon.com> --------- Signed-off-by: Craig Perkins <cwperx@amazon.com> * Expand fetch phase profiling to support inner hits and top hits aggregation phases (opensearch-project#18936) --------- Signed-off-by: Andre van de Ven <andrebvandeven@gmail.com> Signed-off-by: Andre van de Ven <113951599+andrevandeven@users.noreply.github.com> Signed-off-by: Andre van de Ven <andrevdv@amazon.com> Co-authored-by: Andre van de Ven <andrevdv@amazon.com> * IllegalArgumentException when scroll ID has a node no longer part of the Cluster (opensearch-project#19031) --------- Signed-off-by: Anurag Rai <anurag.rai@uber.com> Signed-off-by: Anurag Rai <91844619+anuragrai16@users.noreply.github.com> * Add Changelog entry Signed-off-by: Rajat Gupta <gptrajat@amazon.com> * Add secondary constructor Signed-off-by: Rajat Gupta <gptrajat@amazon.com> * Modify changelog Signed-off-by: Rajat Gupta <gptrajat@amazon.com> * Update changelog Signed-off-by: Rajat Gupta <gptrajat@amazon.com> * Add another constructor to fix breaking change check Signed-off-by: Rajat Gupta <gptrajat@amazon.com> --------- Signed-off-by: Rajat Gupta <gptrajat@amazon.com> Signed-off-by: Atri Sharma <atri.jiit@gmail.com> Signed-off-by: Andrew Ross <andrross@amazon.com> Signed-off-by: Craig Perkins <cwperx@amazon.com> Signed-off-by: Andre van de Ven <andrebvandeven@gmail.com> Signed-off-by: Andre van de Ven <113951599+andrevandeven@users.noreply.github.com> Signed-off-by: Andre van de Ven <andrevdv@amazon.com> Signed-off-by: Anurag Rai <anurag.rai@uber.com> Signed-off-by: Anurag Rai <91844619+anuragrai16@users.noreply.github.com> Co-authored-by: Rajat Gupta <gptrajat@amazon.com> Co-authored-by: Atri Sharma <atri.jiit@gmail.com> Co-authored-by: Andrew Ross <andrross@amazon.com> Co-authored-by: Craig Perkins <cwperx@amazon.com> Co-authored-by: Andre van de Ven <113951599+andrevandeven@users.noreply.github.com> Co-authored-by: Andre van de Ven <andrevdv@amazon.com> Co-authored-by: Anurag Rai <91844619+anuragrai16@users.noreply.github.com>
…-project#18389) The deprecated versions of this method take a type parameter, support for which was removed back in 2.0. The parameter is not used. I have kept the deprecated methods so as to not break downstream components that may be using them but changed all the code in server to stop passing in a type parameter. Signed-off-by: Andrew Ross <andrross@amazon.com>
* Add bwc version 2.19.4 Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update libs/core/src/main/java/org/opensearch/Version.java Signed-off-by: Craig Perkins <craig5008@gmail.com> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Craig Perkins <craig5008@gmail.com> Co-authored-by: opensearch-ci-bot <83309141+opensearch-ci-bot@users.noreply.github.com> Co-authored-by: Craig Perkins <cwperx@amazon.com>
…to ActionListener (opensearch-project#19161) * Add CompletionStage variants to IndicesAdminClient as an alternative to ActionListener Signed-off-by: Craig Perkins <cwperx@amazon.com> * Add to CHANGELOG Signed-off-by: Craig Perkins <cwperx@amazon.com> --------- Signed-off-by: Craig Perkins <cwperx@amazon.com>
…ches (opensearch-project#18981) Signed-off-by: Riley Jerger <rjerger@amazon.com>
…arch-project#19171) The cancellation tests could deadlock when threads are delayed by OS scheduling. If cancellation triggers before all threads start, late threads may hit a code path where batchReduceSize causes the latch callback to be deferred to a MergeTask. Under certain timing conditions, these callbacks never execute, causing latch.await() to hang indefinitely. Ensure latch.countDown() is always called by wrapping consumeResult in try-catch. This guarantees test completion regardless of cancellation timing or exceptions. Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
Signed-off-by: Atri Sharma <atri.jiit@gmail.com>
|
❌ Gradle check result for d31ebea: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Closing in favour of new PR: |
Introduces streaming search infrastructure that enables progressive emission
of search results with three configurable scoring modes. The implementation
extends the existing streaming transport layer to support partial result
computation at the coordinator level.
Scoring modes:
The implementation leverages OpenSearch's inter-node streaming capabilities
to reduce query latency through early result emission. Partial reductions
are triggered based on the selected scoring mode, with results accumulated
at the coordinator before final response generation.
Key changes:
The current implementation operates within architectural constraints where
streaming is limited to inter-node communication. Client-facing streaming
will be addressed in a follow-up contribution.
Addresses #18725
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.