apache · javanna · Sep 10, 2024 · Sep 2, 2024 · Sep 2, 2024 · Sep 2, 2024
diff --git a/lucene/CHANGES.txt b/lucene/CHANGES.txt
@@ -149,6 +149,14 @@ New Features
 * GITHUB#13592: Take advantage of the doc value skipper when it is primary sort in SortedNumericDocValuesRangeQuery
   and SortedSetDocValuesRangeQuery. (Ignacio Vera)
 
+* GITHUB#13542: Add initial support for intra-segment concurrency. IndexSearcher now supports searching across leaf
+  reader partitions concurrently. This is useful to max out available resource usage especially with force merged
+  indices or big segments. There is still a performance penalty for queries that require segment-level computation
+  ahead of time, such as points/range queries. This is an implementation limitation that we expect to improve in
+  future releases, ad that's why intra-segment slicing is not enabled by default, but leveraged in tests when the
+  searcher is created via LuceneTestCase#newSearcher. Users may override IndexSearcher#slices(List) to optionally
+  create slices that target segment partitions. (Luca Cavanna)
+
 Improvements
 ---------------------
 

diff --git a/lucene/MIGRATE.md b/lucene/MIGRATE.md
@@ -815,5 +815,48 @@ both `TopDocs` as well as facets results included in a reduced `FacetsCollector`
 
 ### `SearchWithCollectorTask` no longer supports the `collector.class` config parameter 
 
-`collector.class` used to allow users to load a custom collector implementation. `collector.manager.class` 
-replaces it by allowing users to load a custom collector manager instead.
+`collector.class` used to allow users to load a custom collector implementation. `collector.manager.class`
+replaces it by allowing users to load a custom collector manager instead.
+
+### BulkScorer#score(LeafCollector collector, Bits acceptDocs) removed
+
+Use `BulkScorer#score(LeafCollector collector, Bits acceptDocs, int min, int max)` instead. In order to score the 
+entire leaf, provide `0` as min and `DocIdSetIterator.NO_MORE_DOCS` as max. `BulkScorer` subclasses that override 
+such method need to instead override the method variant that takes the range of doc ids as well as arguments.
+
+### CollectorManager#newCollector and Collector#getLeafCollector contract
+
+With the introduction of intra-segment query concurrency support, multiple `LeafCollector`s may be requested for the 
+same `LeafReaderContext` via `Collector#getLeafCollector(LeafReaderContext)` across the different `Collector` instances 
+returned by multiple `CollectorManager#newCollector` calls. Any logic or computation that needs to happen
+once per segment requires specific handling in the collector manager implementation. See `TotalHitCountCollectorManager` 
+as an example. Individual collectors don't need to be adapted as a specific `Collector` instance will still see a given 
+`LeafReaderContext` once, given that it is not possible to add more than one partition of the same segment to the same 
+leaf slice.
+
+### Weight#scorer, Weight#bulkScorer and Weight#scorerSupplier contract
+
+With the introduction of intra-segment query concurrency support, multiple `Scorer`s, `ScorerSupplier`s or `BulkScorer`s 
+may be requested for the same `LeafReaderContext` instance as part of a single search call. That may happen concurrently 
+from separate threads each searching a specific doc id range of the segment. `Weight` implementations that rely on the 
+assumption that a scorer, bulk scorer or scorer supplier for a given `LeafReaderContext` is requested once per search 
+need updating.
+
+### Signature of IndexSearcher#searchLeaf changed
+
+With the introduction of intra-segment query concurrency support, the `IndexSearcher#searchLeaf(LeafReaderContext ctx, Weight weight, Collector collector)` 
+method now accepts two additional int arguments to identify the min/max range of doc ids that will be searched in this 
+leaf partition`: IndexSearcher#searchLeaf(LeafReaderContext ctx, int minDocId, int maxDocId, Weight weight, Collector collector)`.
+Subclasses of `IndexSearcher` that call or override the `searchLeaf` method need to be updated accordingly.
+
+### Signature of static IndexSearch#slices method changed
+
+The static `IndexSearcher#sslices(List<LeafReaderContext> leaves, int maxDocsPerSlice, int maxSegmentsPerSlice)` 
+method now supports an additional 4th and last argument to optionally enable creating segment partitions:
+`IndexSearcher#slices(List<LeafReaderContext> leaves, int maxDocsPerSlice, int maxSegmentsPerSlice, boolean allowSegmentPartitions)`
+
+### TotalHitCountCollectorManager constructor
+
+`TotalHitCountCollectorManager` now requires that an array of `LeafSlice`s, retrieved via `IndexSearcher#getSlices`, 
+is provided to its constructor. Depending on whether segment partitions are present among slices, the manager can 
+optimize the type of collectors it creates and exposes via `newCollector`.
diff --git a/lucene/core/src/java/org/apache/lucene/search/BulkScorer.java b/lucene/core/src/java/org/apache/lucene/search/BulkScorer.java
@@ -27,18 +27,6 @@
  */
 public abstract class BulkScorer {
 
-  /**
-   * Scores and collects all matching documents.
-   *
-   * @param collector The collector to which all matching documents are passed.
-   * @param acceptDocs {@link Bits} that represents the allowed documents to match, or {@code null}
-   *     if they are all allowed to match.
-   */
-  public void score(LeafCollector collector, Bits acceptDocs) throws IOException {
-    final int next = score(collector, acceptDocs, 0, DocIdSetIterator.NO_MORE_DOCS);
-    assert next == DocIdSetIterator.NO_MORE_DOCS;
-  }
-
   /**
    * Collects matching documents in a range and return an estimation of the next matching document
    * which is on or after {@code max}.

diff --git a/lucene/core/src/java/org/apache/lucene/search/CollectorManager.java b/lucene/core/src/java/org/apache/lucene/search/CollectorManager.java
@@ -18,6 +18,7 @@
 
 import java.io.IOException;
 import java.util.Collection;
+import org.apache.lucene.index.LeafReaderContext;
 
 /**
  * A manager of collectors. This class is useful to parallelize execution of search requests and has
@@ -31,6 +32,12 @@
  *       fully collected.
  * </ul>
  *
+ * <p><strong>Note:</strong> Multiple {@link LeafCollector}s may be requested for the same {@link
+ * LeafReaderContext} via {@link Collector#getLeafCollector(LeafReaderContext)} across the different
+ * {@link Collector}s returned by {@link #newCollector()}. Any computation or logic that needs to
+ * happen once per segment requires specific handling in the collector manager implementation,
+ * because the collection of an entire segment may be split across threads.
+ *
  * @see IndexSearcher#search(Query, CollectorManager)
  * @lucene.experimental
  */