Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
e6ee1f6
Add initial support for intra-segment query concurrency
javanna Sep 2, 2024
b8e72f3
missing javadocs
javanna Sep 2, 2024
c07ae4b
iter
javanna Sep 2, 2024
79b0f10
iter and spotless
javanna Sep 2, 2024
543d9cd
iter
javanna Sep 3, 2024
cf7399f
Merge branch 'main' into intra_segment_concurrency
javanna Sep 3, 2024
d480868
Less contention on TotalHitCountCollectorManager
javanna Sep 3, 2024
c9d52af
Limit max num partitions per segment
javanna Sep 3, 2024
7401e7c
iter
javanna Sep 3, 2024
542e654
iter
javanna Sep 3, 2024
f0f4436
Merge branch 'main' into intra_segment_concurrency
javanna Sep 4, 2024
d44ceac
Address TestTopDocsCollector failure
javanna Sep 4, 2024
3be1d4f
Remove LeafReaderContextPartition#createFrom
javanna Sep 4, 2024
69ce4f8
Address TestRangeFacet failure
javanna Sep 4, 2024
c8d9d07
Remove static LeafReaderContextPartition creation methods
javanna Sep 4, 2024
0d6e6d0
Fix TestBooleanQueryVisitSubscorers failure
javanna Sep 4, 2024
426c4cf
Revert "Fix TestBooleanQueryVisitSubscorers failure"
javanna Sep 4, 2024
f6de93e
Revert "Remove static LeafReaderContextPartition creation methods"
javanna Sep 4, 2024
c1f69e8
iter
javanna Sep 4, 2024
78bf5a4
expand comment on slicesWithPartitions method
javanna Sep 5, 2024
ea86d50
reword changes entry
javanna Sep 5, 2024
51283d2
Merge branch 'main' into intra_segment_concurrency
javanna Sep 5, 2024
c26db4b
switch migrate entries
javanna Sep 5, 2024
aa696ce
iter
javanna Sep 6, 2024
e9c11ac
javadocs
javanna Sep 6, 2024
70f82c8
Merge branch 'main' into intra_segment_concurrency
javanna Sep 6, 2024
d3ebb4d
Merge branch 'main' into intra_segment_concurrency
javanna Sep 7, 2024
b9cfb8e
iter
javanna Sep 7, 2024
7d63f49
Adjust TestForTooMuchCloning
javanna Sep 9, 2024
d42e9c4
adjust facet test
javanna Sep 9, 2024
df87326
clean up facets reduce methods
javanna Sep 9, 2024
ddad4f0
fix logical OR
javanna Sep 9, 2024
1d66416
Merge branch 'main' into intra_segment_concurrency
javanna Sep 9, 2024
df9c7fc
support keepScores merging, re-enable concurrency in those tests
javanna Sep 9, 2024
e1ee42b
fix changes entry
javanna Sep 9, 2024
ff19f64
update migrate entry
javanna Sep 9, 2024
e7205c8
clarify javadocs
javanna Sep 9, 2024
0691c8b
iter
javanna Sep 9, 2024
16b477b
review comments
javanna Sep 9, 2024
4908c58
Merge branch 'main' into intra_segment_concurrency
javanna Sep 9, 2024
ce2d3f2
hit count collector manager
javanna Sep 9, 2024
e546206
typo
javanna Sep 9, 2024
fd295ec
iter
javanna Sep 9, 2024
21aa355
Merge branch 'main' into intra_segment_concurrency
javanna Sep 10, 2024
d446359
last clarifications to changes and migrate entries
javanna Sep 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions lucene/CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,14 @@ New Features
* GITHUB#13592: Take advantage of the doc value skipper when it is primary sort in SortedNumericDocValuesRangeQuery
and SortedSetDocValuesRangeQuery. (Ignacio Vera)

* GITHUB#13542: Add initial support for intra-segment concurrency. IndexSearcher now supports searching across leaf
reader partitions concurrently. This is useful to max out available resource usage especially with force merged
indices or big segments. There is still a performance penalty for queries that require segment-level computation
ahead of time, such as points/range queries. This is an implementation limitation that we expect to improve in
future releases, ad that's why intra-segment slicing is not enabled by default, but leveraged in tests when the
searcher is created via LuceneTestCase#newSearcher. Users may override IndexSearcher#slices(List) to optionally
create slices that target segment partitions. (Luca Cavanna)

Improvements
---------------------

Expand Down
47 changes: 45 additions & 2 deletions lucene/MIGRATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -815,5 +815,48 @@ both `TopDocs` as well as facets results included in a reduced `FacetsCollector`

### `SearchWithCollectorTask` no longer supports the `collector.class` config parameter

`collector.class` used to allow users to load a custom collector implementation. `collector.manager.class`
replaces it by allowing users to load a custom collector manager instead.
`collector.class` used to allow users to load a custom collector implementation. `collector.manager.class`
replaces it by allowing users to load a custom collector manager instead.

### BulkScorer#score(LeafCollector collector, Bits acceptDocs) removed

Use `BulkScorer#score(LeafCollector collector, Bits acceptDocs, int min, int max)` instead. In order to score the
entire leaf, provide `0` as min and `DocIdSetIterator.NO_MORE_DOCS` as max. `BulkScorer` subclasses that override
such method need to instead override the method variant that takes the range of doc ids as well as arguments.

### CollectorManager#newCollector and Collector#getLeafCollector contract

With the introduction of intra-segment query concurrency support, multiple `LeafCollector`s may be requested for the
same `LeafReaderContext` via `Collector#getLeafCollector(LeafReaderContext)` across the different `Collector` instances
returned by multiple `CollectorManager#newCollector` calls. Any logic or computation that needs to happen
once per segment requires specific handling in the collector manager implementation. See `TotalHitCountCollectorManager`
as an example. Individual collectors don't need to be adapted as a specific `Collector` instance will still see a given
`LeafReaderContext` once, given that it is not possible to add more than one partition of the same segment to the same
leaf slice.

### Weight#scorer, Weight#bulkScorer and Weight#scorerSupplier contract

With the introduction of intra-segment query concurrency support, multiple `Scorer`s, `ScorerSupplier`s or `BulkScorer`s
may be requested for the same `LeafReaderContext` instance as part of a single search call. That may happen concurrently
from separate threads each searching a specific doc id range of the segment. `Weight` implementations that rely on the
assumption that a scorer, bulk scorer or scorer supplier for a given `LeafReaderContext` is requested once per search
need updating.

### Signature of IndexSearcher#searchLeaf changed

With the introduction of intra-segment query concurrency support, the `IndexSearcher#searchLeaf(LeafReaderContext ctx, Weight weight, Collector collector)`
method now accepts two additional int arguments to identify the min/max range of doc ids that will be searched in this
leaf partition`: IndexSearcher#searchLeaf(LeafReaderContext ctx, int minDocId, int maxDocId, Weight weight, Collector collector)`.
Subclasses of `IndexSearcher` that call or override the `searchLeaf` method need to be updated accordingly.

### Signature of static IndexSearch#slices method changed

The static `IndexSearcher#sslices(List<LeafReaderContext> leaves, int maxDocsPerSlice, int maxSegmentsPerSlice)`
method now supports an additional 4th and last argument to optionally enable creating segment partitions:
`IndexSearcher#slices(List<LeafReaderContext> leaves, int maxDocsPerSlice, int maxSegmentsPerSlice, boolean allowSegmentPartitions)`

### TotalHitCountCollectorManager constructor

`TotalHitCountCollectorManager` now requires that an array of `LeafSlice`s, retrieved via `IndexSearcher#getSlices`,
is provided to its constructor. Depending on whether segment partitions are present among slices, the manager can
optimize the type of collectors it creates and exposes via `newCollector`.
12 changes: 0 additions & 12 deletions lucene/core/src/java/org/apache/lucene/search/BulkScorer.java
Original file line number Diff line number Diff line change
Expand Up @@ -27,18 +27,6 @@
*/
public abstract class BulkScorer {

/**
* Scores and collects all matching documents.
*
* @param collector The collector to which all matching documents are passed.
* @param acceptDocs {@link Bits} that represents the allowed documents to match, or {@code null}
* if they are all allowed to match.
*/
public void score(LeafCollector collector, Bits acceptDocs) throws IOException {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we agree that we should remove this in main, I am going to deprecate it in branch_9x .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I am not too sure it makes sense to deprecate in 9.x, because it is still heavily used and despite there is a variant of it that takes min and max, it is not a replacement yet until intra-segment concurrency is introduced with this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpountz I've been going back and forth on this a few times. Ideally we would deprecate the method in 9x and give users some notice. It is rather easy to remove usages and deprecate, I only worry about possible custom implementations of the method, because it is not final and subclasses can entirely override its behaviour. Such overrides would be a no-op if we replace all the usages of the deprecated method in 9x, which feels like it breaks stuff potentially. Users would need to immediately switch to overriding the other method instead. If we leave the main usage (IndexSearcher#searchLeaf) behind though, it feels odd that we use a deprecated method in such a core part of search.

I am a bit torn on how to proceed, do you have an opinion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened this after all: #13794 . It deprecates the method but leaves its usages untouched to avoid breaking users that override it.

final int next = score(collector, acceptDocs, 0, DocIdSetIterator.NO_MORE_DOCS);
assert next == DocIdSetIterator.NO_MORE_DOCS;
}

/**
* Collects matching documents in a range and return an estimation of the next matching document
* which is on or after {@code max}.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@

import java.io.IOException;
import java.util.Collection;
import org.apache.lucene.index.LeafReaderContext;

/**
* A manager of collectors. This class is useful to parallelize execution of search requests and has
Expand All @@ -31,6 +32,12 @@
* fully collected.
* </ul>
*
* <p><strong>Note:</strong> Multiple {@link LeafCollector}s may be requested for the same {@link
* LeafReaderContext} via {@link Collector#getLeafCollector(LeafReaderContext)} across the different
* {@link Collector}s returned by {@link #newCollector()}. Any computation or logic that needs to
* happen once per segment requires specific handling in the collector manager implementation,
* because the collection of an entire segment may be split across threads.
*
* @see IndexSearcher#search(Query, CollectorManager)
* @lucene.experimental
*/
Expand Down
Loading