Skip to content

Use branchless way to speedup filterCompetitiveHits #14906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 7, 2025

Conversation

HUSTERGS
Copy link
Contributor

@HUSTERGS HUSTERGS commented Jul 7, 2025

Description

This PR is part of #14896 from the comment . Propose to modify the filterCompetitiveHits in a branchless way, hopefully we can get partially auto-vectorized.

Here is the luceneutil result on wikimediumall with searchConcurrency=0, taskCountPerCat=5, taskRepeatCount=50 after 20 iterations:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
      FilteredOr2Terms2StopWords       67.28      (6.6%)       65.96      (6.8%)   -2.0% ( -14% -   12%) 0.355
                       OrHighMed       92.82      (3.8%)       91.23      (4.1%)   -1.7% (  -9% -    6%) 0.171
               CombinedOrHighMed       28.37      (6.3%)       27.89      (5.8%)   -1.7% ( -13% -   11%) 0.376
              CombinedAndHighMed       29.11      (5.4%)       28.61      (4.5%)   -1.7% ( -11% -    8%) 0.280
                 AndHighOrMedMed       18.44      (3.3%)       18.14      (2.8%)   -1.6% (  -7% -    4%) 0.089
                 DismaxOrHighMed       67.66      (3.7%)       66.60      (3.9%)   -1.6% (  -8% -    6%) 0.194
                         TermB1M      595.71      (3.6%)      586.50      (4.9%)   -1.5% (  -9% -    7%) 0.257
                          Term1M      596.06      (3.6%)      586.93      (5.0%)   -1.5% (  -9% -    7%) 0.266
                            Term      595.68      (3.6%)      586.74      (5.0%)   -1.5% (  -9% -    7%) 0.274
                         Term10K      594.97      (3.8%)      586.18      (4.9%)   -1.5% (  -9% -    7%) 0.283
                       TermB1M1P      595.60      (3.7%)      586.84      (4.9%)   -1.5% (  -9% -    7%) 0.286
                         Term100      595.04      (3.6%)      586.49      (5.0%)   -1.4% (  -9% -    7%) 0.294
                       CountTerm     7970.70      (5.6%)     7856.82      (5.6%)   -1.4% ( -12% -   10%) 0.422
                    FilteredTerm       87.37      (4.3%)       86.13      (4.6%)   -1.4% (  -9% -    7%) 0.314
               FilteredOrHighMed       54.11      (5.0%)       53.35      (5.3%)   -1.4% ( -11% -    9%) 0.388
                      OrHighRare      121.49      (3.2%)      119.92      (3.4%)   -1.3% (  -7% -    5%) 0.220
                 CountOrHighHigh       64.26      (2.6%)       63.43      (1.9%)   -1.3% (  -5% -    3%) 0.075
                DismaxOrHighHigh       47.55      (2.5%)       46.94      (2.9%)   -1.3% (  -6% -    4%) 0.135
            FilteredAndStopWords       12.03      (2.8%)       11.89      (3.7%)   -1.2% (  -7% -    5%) 0.270
                FilteredOr3Terms       59.60      (4.5%)       58.92      (4.7%)   -1.2% (  -9% -    8%) 0.426
                   TermTitleSort       73.24      (4.2%)       72.44      (4.2%)   -1.1% (  -9% -    7%) 0.410
                     CountOrMany        6.48      (2.3%)        6.41      (1.9%)   -1.1% (  -5% -    3%) 0.108
             CountFilteredOrMany        6.08      (1.9%)        6.02      (1.5%)   -1.0% (  -4% -    2%) 0.061
                  CountOrHighMed       97.07      (2.8%)       96.11      (2.6%)   -1.0% (  -6% -    4%) 0.250
                      TermDTSort      193.19      (2.7%)      191.42      (2.4%)   -0.9% (  -5% -    4%) 0.252
                   TermMonthSort     2982.24      (3.7%)     2958.62      (3.4%)   -0.8% (  -7% -    6%) 0.482
             FilteredAndHighHigh       15.23      (2.8%)       15.12      (3.6%)   -0.8% (  -7% -    5%) 0.446
                          Fuzzy1       50.54      (3.2%)       50.15      (3.7%)   -0.8% (  -7% -    6%) 0.473
              FilteredOrHighHigh       18.16      (3.2%)       18.02      (3.2%)   -0.8% (  -6% -    5%) 0.447
                 CountAndHighMed       94.00      (2.8%)       93.32      (2.8%)   -0.7% (  -6% -    5%) 0.416
                      DismaxTerm      657.56      (3.5%)      652.99      (3.8%)   -0.7% (  -7% -    6%) 0.547
                CountAndHighHigh       62.08      (1.8%)       61.67      (1.1%)   -0.7% (  -3% -    2%) 0.161
                          Fuzzy2       45.36      (1.8%)       45.11      (2.4%)   -0.6% (  -4% -    3%) 0.399
                  FilteredPhrase       12.66      (2.3%)       12.59      (2.3%)   -0.6% (  -5% -    4%) 0.433
                     AndHighHigh       28.01      (3.5%)       27.85      (4.2%)   -0.5% (  -8% -    7%) 0.658
                    CombinedTerm       14.62      (3.8%)       14.55      (4.9%)   -0.5% (  -8% -    8%) 0.705
                          Phrase        9.91      (2.1%)        9.86      (2.2%)   -0.5% (  -4% -    3%) 0.453
              Or2Terms2StopWords       79.00      (7.2%)       78.61      (7.4%)   -0.5% ( -14% -   15%) 0.829
                IntervalsOrdered        2.96      (2.4%)        2.94      (2.6%)   -0.5% (  -5% -    4%) 0.564
               FilteredAnd3Terms      132.86      (3.0%)      132.29      (3.4%)   -0.4% (  -6% -    6%) 0.670
             FilteredOrStopWords       11.15      (2.2%)       11.11      (2.4%)   -0.4% (  -4% -    4%) 0.591
             And2Terms2StopWords       76.63      (8.2%)       76.34      (8.3%)   -0.4% ( -15% -   17%) 0.882
         CountFilteredOrHighHigh       25.24      (1.3%)       25.16      (1.3%)   -0.3% (  -2% -    2%) 0.451
          CountFilteredOrHighMed       29.64      (1.4%)       29.56      (1.3%)   -0.2% (  -2% -    2%) 0.570
                          IntNRQ       48.50      (2.2%)       48.38      (2.3%)   -0.2% (  -4% -    4%) 0.741
                  FilteredIntNRQ       48.16      (2.2%)       48.10      (2.3%)   -0.1% (  -4% -    4%) 0.868
             CountFilteredIntNRQ       22.17      (1.3%)       22.14      (1.8%)   -0.1% (  -3% -    3%) 0.829
                  FilteredOrMany        5.11      (3.0%)        5.11      (2.7%)    0.0% (  -5% -    5%) 0.993
     FilteredAnd2Terms2StopWords       77.42      (6.3%)       77.46      (6.5%)    0.0% ( -12% -   13%) 0.984
                    SloppyPhrase        1.47      (4.1%)        1.47      (4.6%)    0.1% (  -8% -    9%) 0.928
                AndMedOrHighHigh       21.15      (1.9%)       21.19      (1.8%)    0.2% (  -3% -    3%) 0.719
                          IntSet      401.08      (4.4%)      402.07      (3.9%)    0.2% (  -7% -    8%) 0.851
               TermDayOfYearSort      356.48      (1.1%)      357.37      (1.2%)    0.2% (  -2% -    2%) 0.490
                        SpanNear        3.07      (4.9%)        3.08      (4.9%)    0.3% (  -9% -   10%) 0.858
                      AndHighMed       70.36      (3.1%)       70.60      (3.0%)    0.3% (  -5% -    6%) 0.722
                         Respell       43.66      (2.9%)       43.81      (2.3%)    0.3% (  -4% -    5%) 0.674
              FilteredAndHighMed       45.14      (2.9%)       45.33      (3.3%)    0.4% (  -5% -    6%) 0.671
             CountFilteredPhrase       11.53      (2.0%)       11.59      (2.2%)    0.5% (  -3% -    4%) 0.435
                 FilteredPrefix3       93.08      (3.5%)       93.69      (2.6%)    0.7% (  -5% -    6%) 0.494
              CombinedOrHighHigh        7.09      (5.4%)        7.14      (5.8%)    0.7% (  -9% -   12%) 0.700
                      OrHighHigh       26.79      (2.4%)       26.97      (4.2%)    0.7% (  -5% -    7%) 0.530
                         Prefix3       99.61      (3.8%)      100.33      (2.5%)    0.7% (  -5% -    7%) 0.483
             CombinedAndHighHigh        7.30      (1.8%)        7.36      (1.9%)    0.8% (  -2% -    4%) 0.161
                        Wildcard       58.26      (3.3%)       58.86      (2.9%)    1.0% (  -4% -    7%) 0.287
                     OrStopWords       10.76      (5.4%)       10.88      (7.2%)    1.1% ( -10% -   14%) 0.574
                     CountPhrase        3.30      (3.6%)        3.34      (2.1%)    1.4% (  -4% -    7%) 0.131
                          OrMany        5.87      (3.0%)        6.03      (4.3%)    2.7% (  -4% -   10%) 0.019
                        Or3Terms       83.12      (2.8%)       85.65      (4.4%)    3.0% (  -4% -   10%) 0.009
                       And3Terms       91.23      (3.3%)       94.32      (4.2%)    3.4% (  -3% -   11%) 0.004
                    AndStopWords       10.01      (3.7%)       10.48      (6.1%)    4.7% (  -4% -   15%) 0.003

Copy link

github-actions bot commented Jul 7, 2025

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@github-actions github-actions bot added this to the 10.3.0 milestone Jul 7, 2025
gesong.samuel added 2 commits July 7, 2025 10:19
Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested an alternative impl as a follow-up to Robert's comment, that uses conditional moves and looks faster: https://github.com/apache/lucene/pull/14906/files#r2189422844

@HUSTERGS
Copy link
Contributor Author

HUSTERGS commented Jul 7, 2025

I suggested an alternative impl as a follow-up to Robert's comment, that uses conditional moves and looks faster: https://github.com/apache/lucene/pull/14906/files#r2189422844

Thank you !! That is amazing ! I pushed another commit according to your suggestion.

@jpountz
Copy link
Contributor

jpountz commented Jul 7, 2025

The new impl is slightly slower on my ARM Mac, but it's still a similar order of magnitude and much faster than the baseline, and the code looks more idiomatic, so I think we're good to merge.

Benchmark                                     (minScoreInclusive)  (size)   Mode  Cnt      Score      Error   Units
CompetitiveBenchmark.baseline                                   0     128  thrpt    5  40218,516 ± 5935,656  ops/ms
CompetitiveBenchmark.baseline                                 0.2     128  thrpt    5   4247,052 ±  216,541  ops/ms
CompetitiveBenchmark.baseline                                 0.4     128  thrpt    5   2401,350 ±   32,013  ops/ms
CompetitiveBenchmark.baseline                                 0.5     128  thrpt    5   2117,681 ±   14,486  ops/ms
CompetitiveBenchmark.baseline                                 0.8     128  thrpt    5   4630,407 ±  235,055  ops/ms
CompetitiveBenchmark.branchlessCandidate                        0     128  thrpt    5  39799,572 ± 8695,960  ops/ms
CompetitiveBenchmark.branchlessCandidate                      0.2     128  thrpt    5  11255,718 ±  579,988  ops/ms
CompetitiveBenchmark.branchlessCandidate                      0.4     128  thrpt    5  11181,453 ±  418,377  ops/ms
CompetitiveBenchmark.branchlessCandidate                      0.5     128  thrpt    5  11238,359 ±  367,859  ops/ms
CompetitiveBenchmark.branchlessCandidate                      0.8     128  thrpt    5  11120,788 ±  263,541  ops/ms
CompetitiveBenchmark.branchlessCandidateCmov                    0     128  thrpt    5  43470,171 ± 4588,605  ops/ms
CompetitiveBenchmark.branchlessCandidateCmov                  0.2     128  thrpt    5  10343,082 ±  324,145  ops/ms
CompetitiveBenchmark.branchlessCandidateCmov                  0.4     128  thrpt    5  10286,804 ±   77,779  ops/ms
CompetitiveBenchmark.branchlessCandidateCmov                  0.5     128  thrpt    5  10256,737 ±  123,636  ops/ms
CompetitiveBenchmark.branchlessCandidateCmov                  0.8     128  thrpt    5  10306,218 ±  169,973  ops/ms

@jpountz jpountz merged commit 5e771d8 into apache:main Jul 7, 2025
8 checks passed
jpountz pushed a commit that referenced this pull request Jul 7, 2025
Co-authored-by: gesong.samuel <gesong.samuel@bytedance.com>
@jpountz
Copy link
Contributor

jpountz commented Jul 8, 2025

Nightly benchmarks caught up with this change. Good speedups on many queries. https://benchmarks.mikemccandless.com/OrMany.html

@jpountz
Copy link
Contributor

jpountz commented Jul 8, 2025

I pushed an annotation.

@msokolov
Copy link
Contributor

msokolov commented Jul 8, 2025

it's crazy that this helps. I have to think about things in a new way. I mean my old way of thinking is that the fewer things you ask the computer to do, the less work it has to do :)

@dweiss
Copy link
Contributor

dweiss commented Jul 8, 2025

it's crazy that this helps. I have to think about things in a new way. I mean my old way of thinking is that the fewer things you ask the computer to do, the less work it has to do :)

I'm the same way... but I think this stopped being true somewhere around mid- 90s. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants