Skip to content

Conversation

@expani
Copy link
Contributor

@expani expani commented Jul 31, 2025

Description

Enabling Intra Segment Concurrent Search for queries without aggregations.

Introduced Cluster and Index level settings similar to Concurrent Segment Search.

Additionally, there is an extra cluster/index setting named partition_size which will control how large segments should be sliced/partitioned.

Related Issues

Resolves #18851
Partially resolves #18849 as we are not introducing Auto Mode and an execution hint with this PR.

Another limitation with the current PR is Intra Segment Search will only work if track_total_hits is disabled. Will follow up in a separate PR. It needs handling of TotalHitCountCollector similar to how Lucene did Wrote more about this in #18854

@github-actions github-actions bot added discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc labels Jul 31, 2025
@github-actions
Copy link
Contributor

❌ Gradle check result for 76f81c6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 5721e4f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 2bb652b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

expani added 2 commits July 31, 2025 15:06
Signed-off-by: expani <anijainc@amazon.com>
Signed-off-by: expani <anijainc@amazon.com>
@expani expani force-pushed the intraseg_phase1 branch from 2bb652b to 8d9f29f Compare July 31, 2025 22:09
Signed-off-by: expani <anijainc@amazon.com>
@github-actions
Copy link
Contributor

❌ Gradle check result for 1f3de07: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: expani <anijainc@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Aug 1, 2025

❌ Gradle check result for 6115def: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: expani <anijainc@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Aug 1, 2025

❌ Gradle check result for 39dd2ee: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

&& Boolean.TRUE.equals(requestShouldUseConcurrentIntraSegmentSearch.get());
}

public void evaluateRequestShouldUseIntraSegmentConcurrentSearch() {
Copy link
Member

@sandeshkr419 sandeshkr419 Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this method be private?

Also, can we refactor it better to avoid multiple if-else:

private void evaluateRequestShouldUseIntraSegmentConcurrentSearch() {
    // Check for any condition that forces concurrency to be disabled.
    final boolean isUnsupported = (sort != null && sort.isSortOnTimeSeriesField())
                               || (aggregations() != null)
                               || (terminateAfter != DEFAULT_TERMINATE_AFTER)
                               || (trackTotalHitsUpTo != TRACK_TOTAL_HITS_DISABLED); // TODO: Need to handle TotalHitCountCollectorManager

    // Check for explicitly enables concurrency.
    final boolean isEnabledByMode = concurrentIntraSegmentSearchMode.equals(CONCURRENT_SEGMENT_SEARCH_MODE_ALL); // TODO: Handle auto mode here

    requestShouldUseConcurrentIntraSegmentSearch.set(isEnabledByMode && !isUnsupported);
}

@atris
Copy link
Contributor

atris commented Aug 1, 2025

Few comments:

partition_size setting is confusing. Why not derive optimal partitions from segment size/thread pool capacity? Fixed partition sizes don't adapt to varying segment characteristics.

Disabling for track_total_hits=true makes sense initially, but this severely limits adoption. Most production queries need hit counts. It would be good to have a concrete plan for TotalHitCountCollector support.

Consider unified namespace: search.concurrent.segment.* and search.concurrent.intra_segment.*

partition_size as integer is problematic. Segments vary 1KB to 10GB. Consider one of the following:

  • Adaptive partitioning based on segment size
  • partition_size as percentage/ratio
  • Separate settings for different segment size buckets

Conditional logic in QueryPhase is getting complex. Extract strategy pattern:

interface SearchExecutionStrategy {
    boolean canExecute(SearchContext context);
    void execute(SearchContext context) throws IOException;
}

Thread safety is unclear for intra-segment collectors. Please document collector state isolation guarantees. Are FieldDoc comparisons thread-safe across partition boundaries?

Benchmarks providing the following data would be good:

  • Performance vs thread count (1,2,4,8,16 threads)
  • Small vs large segments (< 100MB vs > 1GB)
  • Query complexity impact (term vs boolean vs range)
  • Memory overhead per partition

How are results combined from N partitions? If linear merge, this negates benefits for large result sets. We should do heap-based merge for top-K queries.

What happens when partition_size > segment doc count? We should gracefully degrade to single-threaded execution, not fail or create empty partitions.

Sorting exclusion logic is incomplete. Only checks time series fields, but concurrent search breaks with any custom scoring/sorting that depends on document order across segment boundaries.

terminateAfter interaction seems to be problematic. If terminateAfter=100 and partition_size=10, each partition terminates at 10 docs, returning 100 total instead of intended 100 limit. We should have global termination coordination.

Copy link
Contributor

@atris atris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to: #18879 (comment)

@expani
Copy link
Contributor Author

expani commented Aug 4, 2025

partition_size setting is confusing. Why not derive optimal partitions from segment size/thread pool capacity? Fixed partition sizes don't adapt to varying segment characteristics.

It is indicative of the smallest segment partition size below which splitting it would not be performant. Consider, a segment with 10 docs, we wouldn't want to split it to 3 partitions with [ 4, 3, 3 ] docs. Currently, the default of 10_000 means any segment with just 9999 docs or lower won't be split any further.

Disabling for track_total_hits=true makes sense initially, but this severely limits adoption. Most production queries need hit counts. It would be good to have a concrete plan for TotalHitCountCollector support.

See #18854 for the same.

Conditional logic in QueryPhase is getting complex. Extract strategy pattern:

This is unclear. There are no changes in QueryPhase

What happens when partition_size > segment doc count? We should gracefully degrade to single-threaded execution, not fail or create empty partitions.

We would never create empty partitions. It is handled in the code.

Sorting exclusion logic is incomplete. Only checks time series fields, but concurrent search breaks with any custom scoring/sorting that depends on document order across segment boundaries.

terminateAfter interaction seems to be problematic. If terminateAfter=100 and partition_size=10, each partition terminates at 10 docs, returning 100 total instead of intended 100 limit. We should have global termination coordination.

These are existing limitations of Concurrent Segment Search which can be handled separately instead of tying it up with this PR.

Thread safety is unclear for intra-segment collectors. Please document collector state isolation guarantees. Are FieldDoc comparisons thread-safe across partition boundaries?

Every segment's partition will have it's own collector. Any collector sharing state is disabled like TotalHitCount and others have been kept as ConcurrentSegmentSearch. Let me know if you had any specific one in mind.

partition_size as integer is problematic. Segments vary 1KB to 10GB. Consider one of the following:
Adaptive partitioning based on segment size
partition_size as percentage/ratio
Separate settings for different segment size buckets

Partition size is to ensure we don't split a segment below a certain threshold. It doesn't indicate that every segment is split at that specific size.

Currently, we use LPT ( Longest Processing Time ) algorithm with Concurrent Segment Search which would benefit with small segment sizes for achieving equal distribution. Consider, a segment with 1_000_000 docs and a partition size of 1000 and 3 slices, after distribution it would end up as [340_000, 333_000, 333_000 ]

@atris
Copy link
Contributor

atris commented Aug 5, 2025

Thread safety is unclear for intra-segment collectors. Please document collector state isolation guarantees. Are FieldDoc comparisons thread-safe across partition boundaries?

Every segment's partition will have it's own collector. Any collector sharing state is disabled like TotalHitCount and others have been kept as ConcurrentSegmentSearch. Let me know if you had any specific one in mind.

The design seems sound - had these followups:

  1. Result merging: Are partition results merged using standard Lucene TopDocs.merge() after all partitions complete? This is thread-safe.
  2. Collector validation: Where in the code do you prevent stateful collectors? I see aggregations are disabled in evaluateRequestShouldUseIntraSegmentConcurrentSearch() but is there broader validation?
  3. Custom collectors: How do you handle user-provided collectors that might have shared state?

@jainankitk
Copy link
Contributor

partition_size setting is confusing. Why not derive optimal partitions from segment size/thread pool capacity? Fixed partition sizes don't adapt to varying segment characteristics.

I also prefer to derive the partition size as a function of total # of docs / # of slices. There is possibility of creating very small partition for some of the segments in this case, that we can better manage using some percentage based threshold setting. But, I do want to ensure that the number of unique segments being partitioned is minimal while achieving best effort equal distribution of documents across different slices

@expani
Copy link
Contributor Author

expani commented Aug 5, 2025

I also prefer to derive the partition size as a function of total # of docs / # of slices. There is possibility of creating very small partition for some of the segments in this case, that we can better manage using some percentage based threshold setting.

@jainankitk The primary question we need to answer is what's the smallest size that a partition of a segment can have. It would only make sense to partition a segment if the Time(Create Collector, BulkScorer, DISI) << Time(Score/Match the Documents) which is very dependent on the complexity of query. With aggregations, this adds another layer of complexity.

Percentage based can still allow very small partition sizes if a shard only has small segments. So, we need to get a sensible default from our existing known workloads like Big5, Http Logs; etc and still give control to the End Users via Cluster setting, index setting and execution hint.

But, I do want to ensure that the number of unique segments being partitioned is minimal while achieving best effort equal distribution of documents across different slices

The primary goal is to reduce the overall query execution time and make it better than Concurrent Segment Search.
So, we need to achieve a better distribution. Since, we are currently using LPT ( Longest Processing Time ) algorithm with #18451 it performs better with smaller partition sizes.

I am planning to ditch this PR in favor of a different one where I will expand more on these lines and give some psuedo code for the implementation before diving into code. Will post in #18851

@expani
Copy link
Contributor Author

expani commented Aug 5, 2025

Result merging: Are partition results merged using standard Lucene TopDocs.merge() after all partitions complete? This is thread-safe.

One collector is created per Slice and the thread in search thread pool will wait for all results from all slices to complete from the index_searcher thread pool here

Collector validation: Where in the code do you prevent stateful collectors? I see aggregations are disabled in evaluateRequestShouldUseIntraSegmentConcurrentSearch() but is there broader validation?

Aggregations are disabled and TotalHitCount is disabled which are the stateful collectors that I am aware of at the moment. If you feel there is some missing case, do let me know.

Custom collectors: How do you handle user-provided collectors that might have shared state?

That's a good point, you mean via plugins extending SearchPlugin ?

Will add the same in #18851

Copy link
Contributor

@jainankitk jainankitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am planning to ditch this PR in favor of a different one where I will expand more on these lines and give some psuedo code for the implementation before diving into code. Will post in #18851

Since we don't want to continue this PR in favor of slightly different approach, can we move this PR in a draft state?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Intra-SegmentConcurrentSearch] Slicing mechanism [Intra-SegmentConcurrentSearch] End User Settings

4 participants