Use branchless way to speedup filterCompetitiveHits #14906

HUSTERGS · 2025-07-07T00:57:06Z

Description

This PR is part of #14896 from the comment . Propose to modify the filterCompetitiveHits in a branchless way, hopefully we can get partially auto-vectorized.

Here is the luceneutil result on wikimediumall with searchConcurrency=0, taskCountPerCat=5, taskRepeatCount=50 after 20 iterations:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
      FilteredOr2Terms2StopWords       67.28      (6.6%)       65.96      (6.8%)   -2.0% ( -14% -   12%) 0.355
                       OrHighMed       92.82      (3.8%)       91.23      (4.1%)   -1.7% (  -9% -    6%) 0.171
               CombinedOrHighMed       28.37      (6.3%)       27.89      (5.8%)   -1.7% ( -13% -   11%) 0.376
              CombinedAndHighMed       29.11      (5.4%)       28.61      (4.5%)   -1.7% ( -11% -    8%) 0.280
                 AndHighOrMedMed       18.44      (3.3%)       18.14      (2.8%)   -1.6% (  -7% -    4%) 0.089
                 DismaxOrHighMed       67.66      (3.7%)       66.60      (3.9%)   -1.6% (  -8% -    6%) 0.194
                         TermB1M      595.71      (3.6%)      586.50      (4.9%)   -1.5% (  -9% -    7%) 0.257
                          Term1M      596.06      (3.6%)      586.93      (5.0%)   -1.5% (  -9% -    7%) 0.266
                            Term      595.68      (3.6%)      586.74      (5.0%)   -1.5% (  -9% -    7%) 0.274
                         Term10K      594.97      (3.8%)      586.18      (4.9%)   -1.5% (  -9% -    7%) 0.283
                       TermB1M1P      595.60      (3.7%)      586.84      (4.9%)   -1.5% (  -9% -    7%) 0.286
                         Term100      595.04      (3.6%)      586.49      (5.0%)   -1.4% (  -9% -    7%) 0.294
                       CountTerm     7970.70      (5.6%)     7856.82      (5.6%)   -1.4% ( -12% -   10%) 0.422
                    FilteredTerm       87.37      (4.3%)       86.13      (4.6%)   -1.4% (  -9% -    7%) 0.314
               FilteredOrHighMed       54.11      (5.0%)       53.35      (5.3%)   -1.4% ( -11% -    9%) 0.388
                      OrHighRare      121.49      (3.2%)      119.92      (3.4%)   -1.3% (  -7% -    5%) 0.220
                 CountOrHighHigh       64.26      (2.6%)       63.43      (1.9%)   -1.3% (  -5% -    3%) 0.075
                DismaxOrHighHigh       47.55      (2.5%)       46.94      (2.9%)   -1.3% (  -6% -    4%) 0.135
            FilteredAndStopWords       12.03      (2.8%)       11.89      (3.7%)   -1.2% (  -7% -    5%) 0.270
                FilteredOr3Terms       59.60      (4.5%)       58.92      (4.7%)   -1.2% (  -9% -    8%) 0.426
                   TermTitleSort       73.24      (4.2%)       72.44      (4.2%)   -1.1% (  -9% -    7%) 0.410
                     CountOrMany        6.48      (2.3%)        6.41      (1.9%)   -1.1% (  -5% -    3%) 0.108
             CountFilteredOrMany        6.08      (1.9%)        6.02      (1.5%)   -1.0% (  -4% -    2%) 0.061
                  CountOrHighMed       97.07      (2.8%)       96.11      (2.6%)   -1.0% (  -6% -    4%) 0.250
                      TermDTSort      193.19      (2.7%)      191.42      (2.4%)   -0.9% (  -5% -    4%) 0.252
                   TermMonthSort     2982.24      (3.7%)     2958.62      (3.4%)   -0.8% (  -7% -    6%) 0.482
             FilteredAndHighHigh       15.23      (2.8%)       15.12      (3.6%)   -0.8% (  -7% -    5%) 0.446
                          Fuzzy1       50.54      (3.2%)       50.15      (3.7%)   -0.8% (  -7% -    6%) 0.473
              FilteredOrHighHigh       18.16      (3.2%)       18.02      (3.2%)   -0.8% (  -6% -    5%) 0.447
                 CountAndHighMed       94.00      (2.8%)       93.32      (2.8%)   -0.7% (  -6% -    5%) 0.416
                      DismaxTerm      657.56      (3.5%)      652.99      (3.8%)   -0.7% (  -7% -    6%) 0.547
                CountAndHighHigh       62.08      (1.8%)       61.67      (1.1%)   -0.7% (  -3% -    2%) 0.161
                          Fuzzy2       45.36      (1.8%)       45.11      (2.4%)   -0.6% (  -4% -    3%) 0.399
                  FilteredPhrase       12.66      (2.3%)       12.59      (2.3%)   -0.6% (  -5% -    4%) 0.433
                     AndHighHigh       28.01      (3.5%)       27.85      (4.2%)   -0.5% (  -8% -    7%) 0.658
                    CombinedTerm       14.62      (3.8%)       14.55      (4.9%)   -0.5% (  -8% -    8%) 0.705
                          Phrase        9.91      (2.1%)        9.86      (2.2%)   -0.5% (  -4% -    3%) 0.453
              Or2Terms2StopWords       79.00      (7.2%)       78.61      (7.4%)   -0.5% ( -14% -   15%) 0.829
                IntervalsOrdered        2.96      (2.4%)        2.94      (2.6%)   -0.5% (  -5% -    4%) 0.564
               FilteredAnd3Terms      132.86      (3.0%)      132.29      (3.4%)   -0.4% (  -6% -    6%) 0.670
             FilteredOrStopWords       11.15      (2.2%)       11.11      (2.4%)   -0.4% (  -4% -    4%) 0.591
             And2Terms2StopWords       76.63      (8.2%)       76.34      (8.3%)   -0.4% ( -15% -   17%) 0.882
         CountFilteredOrHighHigh       25.24      (1.3%)       25.16      (1.3%)   -0.3% (  -2% -    2%) 0.451
          CountFilteredOrHighMed       29.64      (1.4%)       29.56      (1.3%)   -0.2% (  -2% -    2%) 0.570
                          IntNRQ       48.50      (2.2%)       48.38      (2.3%)   -0.2% (  -4% -    4%) 0.741
                  FilteredIntNRQ       48.16      (2.2%)       48.10      (2.3%)   -0.1% (  -4% -    4%) 0.868
             CountFilteredIntNRQ       22.17      (1.3%)       22.14      (1.8%)   -0.1% (  -3% -    3%) 0.829
                  FilteredOrMany        5.11      (3.0%)        5.11      (2.7%)    0.0% (  -5% -    5%) 0.993
     FilteredAnd2Terms2StopWords       77.42      (6.3%)       77.46      (6.5%)    0.0% ( -12% -   13%) 0.984
                    SloppyPhrase        1.47      (4.1%)        1.47      (4.6%)    0.1% (  -8% -    9%) 0.928
                AndMedOrHighHigh       21.15      (1.9%)       21.19      (1.8%)    0.2% (  -3% -    3%) 0.719
                          IntSet      401.08      (4.4%)      402.07      (3.9%)    0.2% (  -7% -    8%) 0.851
               TermDayOfYearSort      356.48      (1.1%)      357.37      (1.2%)    0.2% (  -2% -    2%) 0.490
                        SpanNear        3.07      (4.9%)        3.08      (4.9%)    0.3% (  -9% -   10%) 0.858
                      AndHighMed       70.36      (3.1%)       70.60      (3.0%)    0.3% (  -5% -    6%) 0.722
                         Respell       43.66      (2.9%)       43.81      (2.3%)    0.3% (  -4% -    5%) 0.674
              FilteredAndHighMed       45.14      (2.9%)       45.33      (3.3%)    0.4% (  -5% -    6%) 0.671
             CountFilteredPhrase       11.53      (2.0%)       11.59      (2.2%)    0.5% (  -3% -    4%) 0.435
                 FilteredPrefix3       93.08      (3.5%)       93.69      (2.6%)    0.7% (  -5% -    6%) 0.494
              CombinedOrHighHigh        7.09      (5.4%)        7.14      (5.8%)    0.7% (  -9% -   12%) 0.700
                      OrHighHigh       26.79      (2.4%)       26.97      (4.2%)    0.7% (  -5% -    7%) 0.530
                         Prefix3       99.61      (3.8%)      100.33      (2.5%)    0.7% (  -5% -    7%) 0.483
             CombinedAndHighHigh        7.30      (1.8%)        7.36      (1.9%)    0.8% (  -2% -    4%) 0.161
                        Wildcard       58.26      (3.3%)       58.86      (2.9%)    1.0% (  -4% -    7%) 0.287
                     OrStopWords       10.76      (5.4%)       10.88      (7.2%)    1.1% ( -10% -   14%) 0.574
                     CountPhrase        3.30      (3.6%)        3.34      (2.1%)    1.4% (  -4% -    7%) 0.131
                          OrMany        5.87      (3.0%)        6.03      (4.3%)    2.7% (  -4% -   10%) 0.019
                        Or3Terms       83.12      (2.8%)       85.65      (4.4%)    3.0% (  -4% -   10%) 0.009
                       And3Terms       91.23      (3.3%)       94.32      (4.2%)    3.4% (  -3% -   11%) 0.004
                    AndStopWords       10.01      (3.7%)       10.48      (6.1%)    4.7% (  -4% -   15%) 0.003

github-actions · 2025-07-07T00:58:01Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

lucene/core/src/java/org/apache/lucene/search/ScorerUtil.java

This reverts commit 4c352a4.

ChrisHegarty

LGTM

jpountz

I suggested an alternative impl as a follow-up to Robert's comment, that uses conditional moves and looks faster: https://github.com/apache/lucene/pull/14906/files#r2189422844

HUSTERGS · 2025-07-07T10:13:22Z

I suggested an alternative impl as a follow-up to Robert's comment, that uses conditional moves and looks faster: https://github.com/apache/lucene/pull/14906/files#r2189422844

Thank you !! That is amazing ! I pushed another commit according to your suggestion.

jpountz · 2025-07-07T11:17:57Z

The new impl is slightly slower on my ARM Mac, but it's still a similar order of magnitude and much faster than the baseline, and the code looks more idiomatic, so I think we're good to merge.

Benchmark                                     (minScoreInclusive)  (size)   Mode  Cnt      Score      Error   Units
CompetitiveBenchmark.baseline                                   0     128  thrpt    5  40218,516 ± 5935,656  ops/ms
CompetitiveBenchmark.baseline                                 0.2     128  thrpt    5   4247,052 ±  216,541  ops/ms
CompetitiveBenchmark.baseline                                 0.4     128  thrpt    5   2401,350 ±   32,013  ops/ms
CompetitiveBenchmark.baseline                                 0.5     128  thrpt    5   2117,681 ±   14,486  ops/ms
CompetitiveBenchmark.baseline                                 0.8     128  thrpt    5   4630,407 ±  235,055  ops/ms
CompetitiveBenchmark.branchlessCandidate                        0     128  thrpt    5  39799,572 ± 8695,960  ops/ms
CompetitiveBenchmark.branchlessCandidate                      0.2     128  thrpt    5  11255,718 ±  579,988  ops/ms
CompetitiveBenchmark.branchlessCandidate                      0.4     128  thrpt    5  11181,453 ±  418,377  ops/ms
CompetitiveBenchmark.branchlessCandidate                      0.5     128  thrpt    5  11238,359 ±  367,859  ops/ms
CompetitiveBenchmark.branchlessCandidate                      0.8     128  thrpt    5  11120,788 ±  263,541  ops/ms
CompetitiveBenchmark.branchlessCandidateCmov                    0     128  thrpt    5  43470,171 ± 4588,605  ops/ms
CompetitiveBenchmark.branchlessCandidateCmov                  0.2     128  thrpt    5  10343,082 ±  324,145  ops/ms
CompetitiveBenchmark.branchlessCandidateCmov                  0.4     128  thrpt    5  10286,804 ±   77,779  ops/ms
CompetitiveBenchmark.branchlessCandidateCmov                  0.5     128  thrpt    5  10256,737 ±  123,636  ops/ms
CompetitiveBenchmark.branchlessCandidateCmov                  0.8     128  thrpt    5  10306,218 ±  169,973  ops/ms

Co-authored-by: gesong.samuel <gesong.samuel@bytedance.com>

jpountz · 2025-07-08T15:04:35Z

Nightly benchmarks caught up with this change. Good speedups on many queries. https://benchmarks.mikemccandless.com/OrMany.html

jpountz · 2025-07-08T18:16:38Z

I pushed an annotation.

msokolov · 2025-07-08T18:29:21Z

it's crazy that this helps. I have to think about things in a new way. I mean my old way of thinking is that the fewer things you ask the computer to do, the less work it has to do :)

dweiss · 2025-07-08T18:53:11Z

it's crazy that this helps. I have to think about things in a new way. I mean my old way of thinking is that the fewer things you ask the computer to do, the less work it has to do :)

I'm the same way... but I think this stopped being true somewhere around mid- 90s. :)

init

c8058ec

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking Jul 7, 2025

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking Jul 7, 2025

github-actions bot added the module:core/search label Jul 7, 2025

add change

eb66417

github-actions bot added this to the 10.3.0 milestone Jul 7, 2025

HUSTERGS mentioned this pull request Jul 7, 2025

Vectorize filterCompetitiveHits #14896

Open

rmuir reviewed Jul 7, 2025

View reviewed changes

lucene/core/src/java/org/apache/lucene/search/ScorerUtil.java Outdated Show resolved Hide resolved

gesong.samuel added 2 commits July 7, 2025 10:19

resolve comment

4c352a4

Revert "resolve comment"

eb09f1e

This reverts commit 4c352a4.

ChrisHegarty approved these changes Jul 7, 2025

View reviewed changes

jpountz reviewed Jul 7, 2025

View reviewed changes

opt

e1e724c

rmuir approved these changes Jul 7, 2025

View reviewed changes

jpountz approved these changes Jul 7, 2025

View reviewed changes

jpountz merged commit 5e771d8 into apache:main Jul 7, 2025
8 checks passed

github-project-automation bot moved this from Open to Merged in OpenSearch Lucene & Core Performance Tracking Jul 7, 2025

jpountz pushed a commit that referenced this pull request Jul 7, 2025

Use branchless way to speedup filterCompetitiveHits (#14906)

db11ae7

Co-authored-by: gesong.samuel <gesong.samuel@bytedance.com>

gf2121 mentioned this pull request Jul 12, 2025

Optimize bitset to array #14935

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use branchless way to speedup filterCompetitiveHits #14906

Use branchless way to speedup filterCompetitiveHits #14906

Uh oh!

HUSTERGS commented Jul 7, 2025

Uh oh!

github-actions bot commented Jul 7, 2025

Uh oh!

Uh oh!

ChrisHegarty left a comment

Uh oh!

jpountz left a comment

Uh oh!

HUSTERGS commented Jul 7, 2025

Uh oh!

jpountz commented Jul 7, 2025

Uh oh!

Uh oh!

jpountz commented Jul 8, 2025

Uh oh!

jpountz commented Jul 8, 2025

Uh oh!

msokolov commented Jul 8, 2025

Uh oh!

dweiss commented Jul 8, 2025

Uh oh!

Uh oh!

Use branchless way to speedup filterCompetitiveHits #14906

Use branchless way to speedup filterCompetitiveHits #14906

Uh oh!

Conversation

HUSTERGS commented Jul 7, 2025

Description

Uh oh!

github-actions bot commented Jul 7, 2025

Uh oh!

Uh oh!

ChrisHegarty left a comment

Choose a reason for hiding this comment

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

HUSTERGS commented Jul 7, 2025

Uh oh!

jpountz commented Jul 7, 2025

Uh oh!

Uh oh!

jpountz commented Jul 8, 2025

Uh oh!

jpountz commented Jul 8, 2025

Uh oh!

msokolov commented Jul 8, 2025

Uh oh!

dweiss commented Jul 8, 2025

Uh oh!

Uh oh!