Skip to content

[Intra-SegmentConcurrentSearch] Slicing mechanism #18851

@expani

Description

@expani

Overview

OpenSearch currently distributes the segments into slices in a round robin fashion after sorting them based on their docIds. See MaxTargetSliceSupplier for the implementation. The core idea is

More number of docs in a segment = more work done by the query

The current approach can lead to unequal distribution amongst the slices which is being addressed by #18451 ensuring there is equal distribution as much as possible.

This issue tracks the changes required in slicing mechanism to keep equal distribution while also splitting a segment into multiple partitions.

Meta #18852

Describe the solution you'd like

We are planning to introduce search.concurrent_intra_segment.partition_size discussed more in #18849
This will control the minimum number of docs that should be present in the partition of a segment.

Consider one segment with 31_000 docs and slice count of 3, if search.concurrent_intra_segment.partition_size is equal to 10_000 then the slices will be generated as

[ Slice-1, Slice-2, Slice-3 ]
[ 10_334, 10_333, 10_333 ]

EDIT : Please see the comment below #18851 (comment) for the new proposed approach. The one mentioned below is stale and no longer considered.

To achieve equal distribution with intra segment slicing, the following things need to be done :

  • Given all the leaves List<LeafReaferContext> split any leaf which is eligible based on search.concurrent_intra_segment.partition_size

  • Generate all List<LeafReaderContextPartition> containing both whole segments and segments which were split into partitions.

  • Sort the above list based on the maximum docs and distribute equally amongst the leaf slices using PriorityQueue.

  • There could be a case that a slice has 2 partitions of the same leaf/segment in which case IndexSearcher would enforce distinct partitions and throw an exception We would need to merge multiple partitions in a Slice to avoid the exception.

  • Also, merging 2 partitions with a slice would be easy but we would need to a pass again to ensure the overall doc count is distributed properly. It could be that for equal distribution 2 partitions that are not continuous fall within same slice.

Related component

Search

Metadata

Metadata

Labels

SearchSearch query, autocomplete ...etcenhancementEnhancement or improvement to existing feature or request

Type

No type

Projects

Status

🆕 New

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions