Skip to content

Vectorize filterCompetitiveHits #14896

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

HUSTERGS
Copy link
Contributor

@HUSTERGS HUSTERGS commented Jul 3, 2025

Description

This PR is a follow-up of the comment from #14827 , trying to vectorize the filterCompetitiveHits function by utilizing (Int|Float)Vector#compress.

I'm still working on it, tests are not added yet, nor is the code stable , comments and suggestions are welcomed !

But I do did a quick run of luceneutil based on 62e0276032189deee9559327cc53ac3f59f354a9 with wikimediumall with searchConcurrency=0, taskCountPerCat=1, taskRepeatCount=20, here is the result after 20 iterations, which seems to be promising (hope I didn't get anything wrong). Will do another run with different setup

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                   TermMonthSort     1715.92      (5.2%)     1677.87      (6.7%)   -2.2% ( -13% -   10%) 0.245
                      DismaxTerm      687.90      (3.6%)      672.66      (4.8%)   -2.2% ( -10% -    6%) 0.099
                 FilteredPrefix3      145.56      (3.6%)      142.51      (5.1%)   -2.1% ( -10% -    6%) 0.134
                       OrHighMed      124.23      (5.3%)      122.03     (12.4%)   -1.8% ( -18% -   16%) 0.556
                        Wildcard       89.60      (2.7%)       88.11      (3.4%)   -1.7% (  -7% -    4%) 0.087
                          Fuzzy2       25.52      (3.2%)       25.24      (4.1%)   -1.1% (  -8% -    6%) 0.342
                         Respell       38.38      (2.6%)       37.99      (2.1%)   -1.0% (  -5% -    3%) 0.172
               TermDayOfYearSort      252.39      (3.3%)      250.17      (4.3%)   -0.9% (  -8% -    6%) 0.463
                          Phrase        4.58      (2.5%)        4.54      (3.3%)   -0.8% (  -6% -    5%) 0.378
                     CountPhrase        3.21      (2.1%)        3.19      (2.7%)   -0.7% (  -5% -    4%) 0.352
                 DismaxOrHighMed       65.05      (3.6%)       64.59      (7.7%)   -0.7% ( -11% -   11%) 0.709
             FilteredOrStopWords        8.51      (2.8%)        8.46      (2.6%)   -0.6% (  -5% -    4%) 0.507
                   TermTitleSort       63.32      (5.6%)       63.00      (5.0%)   -0.5% ( -10% -   10%) 0.761
               FilteredOrHighMed       21.50      (3.7%)       21.41      (3.7%)   -0.4% (  -7% -    7%) 0.722
                  FilteredIntNRQ      292.41      (7.3%)      291.22      (7.9%)   -0.4% ( -14% -   15%) 0.866
                        SpanNear        3.32      (3.5%)        3.31      (2.8%)   -0.4% (  -6% -    6%) 0.703
                 CountOrHighHigh       68.53      (2.5%)       68.32      (2.3%)   -0.3% (  -4% -    4%) 0.687
                    SloppyPhrase        0.61      (6.3%)        0.61      (5.0%)   -0.3% ( -10% -   11%) 0.871
                            Term      575.10      (4.6%)      573.49      (8.2%)   -0.3% ( -12% -   13%) 0.895
                         TermB1M      571.23      (5.4%)      569.72      (7.8%)   -0.3% ( -12% -   13%) 0.901
              FilteredOrHighHigh       17.20      (3.2%)       17.16      (3.5%)   -0.2% (  -6% -    6%) 0.834
                          IntSet      339.77      (5.5%)      339.44      (6.1%)   -0.1% ( -11% -   12%) 0.958
                FilteredOr3Terms       43.39      (4.4%)       43.36      (4.2%)   -0.1% (  -8% -    8%) 0.961
                  CountOrHighMed       98.19      (4.6%)       98.13      (4.2%)   -0.1% (  -8% -    9%) 0.968
                         Prefix3       80.76      (5.2%)       80.73      (6.3%)   -0.0% ( -10% -   12%) 0.985
                    CombinedTerm       17.50      (3.6%)       17.50      (4.8%)    0.0% (  -8% -    8%) 0.984
               CombinedOrHighMed       48.08      (7.5%)       48.09      (8.3%)    0.0% ( -14% -   17%) 0.990
                          IntNRQ       28.63      (2.2%)       28.65      (2.1%)    0.1% (  -4% -    4%) 0.930
                    FilteredTerm       69.87      (5.9%)       69.91      (5.3%)    0.1% ( -10% -   11%) 0.972
                  FilteredOrMany        7.62      (2.4%)        7.62      (2.3%)    0.1% (  -4% -    4%) 0.927
             CountFilteredIntNRQ       26.19      (2.5%)       26.21      (2.7%)    0.1% (  -4% -    5%) 0.921
                          Term1M      641.82      (5.9%)      642.71      (8.5%)    0.1% ( -13% -   15%) 0.953
                      TermDTSort      189.34      (3.4%)      189.71      (2.7%)    0.2% (  -5% -    6%) 0.838
                         Term10K      573.66      (5.0%)      574.80      (7.6%)    0.2% ( -11% -   13%) 0.922
      FilteredOr2Terms2StopWords       70.27      (5.8%)       70.41      (5.5%)    0.2% ( -10% -   12%) 0.909
                       And3Terms       98.82      (3.2%)       99.02      (7.1%)    0.2% (  -9% -   10%) 0.906
                       TermB1M1P      569.87      (4.9%)      571.05      (7.8%)    0.2% ( -11% -   13%) 0.920
             And2Terms2StopWords      162.85     (13.1%)      163.32     (12.2%)    0.3% ( -22% -   29%) 0.942
                  FilteredPhrase        6.13      (2.8%)        6.15      (3.4%)    0.3% (  -5% -    6%) 0.769
                      OrHighRare       53.22      (9.8%)       53.38      (9.1%)    0.3% ( -17% -   21%) 0.921
                          Fuzzy1       34.42      (2.7%)       34.53      (5.8%)    0.3% (  -7% -    9%) 0.828
                         Term100      634.74      (5.1%)      636.93      (7.7%)    0.3% ( -11% -   13%) 0.867
          CountFilteredOrHighMed       30.06      (1.4%)       30.17      (1.4%)    0.4% (  -2% -    3%) 0.401
                CountAndHighHigh       57.86      (1.6%)       58.09      (1.6%)    0.4% (  -2% -    3%) 0.445
               FilteredAnd3Terms       91.01      (2.5%)       91.37      (2.6%)    0.4% (  -4% -    5%) 0.625
         CountFilteredOrHighHigh       25.71      (1.6%)       25.83      (1.4%)    0.5% (  -2% -    3%) 0.307
             CountFilteredPhrase       10.11      (3.5%)       10.17      (3.2%)    0.6% (  -5% -    7%) 0.602
                IntervalsOrdered        3.73      (3.2%)        3.75      (2.5%)    0.6% (  -4% -    6%) 0.526
                 CountAndHighMed      111.65      (5.1%)      112.38      (5.2%)    0.6% (  -9% -   11%) 0.692
                     CountOrMany        8.24      (2.1%)        8.30      (2.1%)    0.7% (  -3% -    5%) 0.285
             CountFilteredOrMany        5.95      (2.6%)        6.00      (1.7%)    0.8% (  -3% -    5%) 0.246
                DismaxOrHighHigh       67.00      (5.7%)       67.55      (5.5%)    0.8% (  -9% -   12%) 0.645
                 AndHighOrMedMed       10.11      (2.8%)       10.19      (3.0%)    0.9% (  -4% -    6%) 0.351
              FilteredAndHighMed       90.50      (2.2%)       91.38      (4.1%)    1.0% (  -5% -    7%) 0.354
             FilteredAndHighHigh       14.62      (3.0%)       14.77      (3.1%)    1.1% (  -4% -    7%) 0.265
     FilteredAnd2Terms2StopWords      100.48      (7.1%)      101.62      (7.4%)    1.1% ( -12% -   16%) 0.618
              CombinedOrHighHigh        8.54      (4.2%)        8.64      (3.8%)    1.2% (  -6% -    9%) 0.352
              CombinedAndHighMed       46.87      (7.6%)       47.55      (9.2%)    1.5% ( -14% -   19%) 0.584
            FilteredAndStopWords       14.46      (3.7%)       14.71      (3.8%)    1.7% (  -5% -    9%) 0.138
                      OrHighHigh       24.60      (3.7%)       25.08     (14.6%)    1.9% ( -15% -   21%) 0.565
              Or2Terms2StopWords      165.15     (10.1%)      168.56     (11.6%)    2.1% ( -17% -   26%) 0.550
             CombinedAndHighHigh        7.29      (1.4%)        7.44      (2.9%)    2.1% (  -2% -    6%) 0.003
                AndMedOrHighHigh       32.71      (2.3%)       33.47      (3.8%)    2.3% (  -3% -    8%) 0.021
                       CountTerm     3847.80      (6.9%)     3959.24     (11.0%)    2.9% ( -14% -   22%) 0.320
                     AndHighHigh       28.59      (3.8%)       29.49     (11.0%)    3.2% ( -11% -   18%) 0.225
                     OrStopWords        6.27      (7.1%)        6.57     (13.2%)    4.7% ( -14% -   26%) 0.165
                      AndHighMed       64.75      (3.5%)       69.44      (9.1%)    7.2% (  -5% -   20%) 0.001
                          OrMany        4.60      (4.3%)        5.01      (5.7%)    8.9% (  -1% -   19%) 0.000
                        Or3Terms       48.08      (4.3%)       52.48     (12.6%)    9.1% (  -7% -   27%) 0.002
                    AndStopWords        5.72      (5.0%)        6.34      (9.9%)   11.0% (  -3% -   27%) 0.000

(BTW, The lastest luceneutil have some constructor problem since #14873 is introduced, will get error like below)
image

Copy link

github-actions bot commented Jul 3, 2025

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Copy link

github-actions bot commented Jul 3, 2025

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@HUSTERGS
Copy link
Contributor Author

HUSTERGS commented Jul 3, 2025

BTW, benchmark result showed above runs on a machine with
cpu
Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz
and flags
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid md_clear arch_capabilities

Copy link

github-actions bot commented Jul 3, 2025

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@jpountz
Copy link
Contributor

jpountz commented Jul 3, 2025

Wow, this is a big speedup! I'd like to get opinions on INT_FOR_DOUBLE_SPECIES from folks who are familiar with the vector API, maybe @uschindler or @ChrisHegarty. I know we've had to be careful with the vector API at times as playing some tricks may get even slower than scalar code on some hardware.

taskCountPerCat=1

FWIW I like higher values of this parameter better. Otherwise a change may look like it's a speedup when in fact it is overfitted for a particular query, and other similar queries don't get the same speedup. I usually keep it at 5 like nightly benchmarks.

Separately I wonder if we should keep accumulating scores in doubles to contain the accuracy loss. We don't do it 100% consistently (e.g. if you mix SHOULD and MUST clauses, or if you have duplicate clauses that get rewritten as a boosted query) and it complicates optimizations like this one.

@HUSTERGS
Copy link
Contributor Author

HUSTERGS commented Jul 4, 2025

I know we've had to be careful with the vector API at times as playing some tricks may get even slower than scalar code on some hardware.

Yeah, I aggree with that too, I've seen the ENABLE_FIND_NEXT_GEQ_VECTOR_OPTO variable which is used make sure we have enough lanes, I'm wondering whether we should do a similar thing here

FWIW I like higher values of this parameter better.

Of course! I ran a luceneutil on wikimediumall with searchConcurrency=0, taskCountPerCat=5, taskRepeatCount=50, here is the result after 20 iterations (it takes some time to finish):

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                    CombinedTerm       14.73      (4.3%)       14.48      (5.6%)   -1.7% ( -11% -    8%) 0.277
                        Wildcard       60.18      (3.3%)       59.48      (3.0%)   -1.2% (  -7% -    5%) 0.242
                          IntSet      411.04      (4.4%)      408.11      (4.2%)   -0.7% (  -8% -    8%) 0.601
                 FilteredPrefix3       95.00      (2.7%)       94.50      (2.8%)   -0.5% (  -5% -    5%) 0.542
                         Respell       44.49      (2.7%)       44.27      (3.1%)   -0.5% (  -6% -    5%) 0.586
                  FilteredIntNRQ       48.90      (1.6%)       48.67      (1.7%)   -0.5% (  -3% -    2%) 0.379
                          IntNRQ       49.17      (1.7%)       48.95      (1.9%)   -0.4% (  -3% -    3%) 0.432
                         Prefix3      101.69      (3.0%)      101.28      (3.0%)   -0.4% (  -6% -    5%) 0.666
             CountFilteredIntNRQ       22.52      (2.0%)       22.45      (2.0%)   -0.3% (  -4% -    3%) 0.648
                      DismaxTerm      659.35      (4.8%)      658.31      (5.4%)   -0.2% (  -9% -   10%) 0.922
                IntervalsOrdered        3.03      (2.3%)        3.03      (2.3%)   -0.1% (  -4% -    4%) 0.919
                    FilteredTerm       88.58      (4.9%)       88.58      (4.9%)   -0.0% (  -9% -   10%) 0.999
                     CountPhrase        3.32      (2.2%)        3.32      (3.5%)    0.0% (  -5% -    5%) 0.993
                      TermDTSort      194.75      (2.5%)      194.77      (2.4%)    0.0% (  -4% -    5%) 0.987
             CountFilteredPhrase       11.57      (2.0%)       11.58      (2.4%)    0.0% (  -4% -    4%) 0.952
                   TermTitleSort       72.55      (4.0%)       72.59      (4.3%)    0.1% (  -7% -    8%) 0.969
                          Fuzzy2       46.06      (2.3%)       46.10      (2.2%)    0.1% (  -4% -    4%) 0.904
                        SpanNear        3.10      (4.6%)        3.10      (5.0%)    0.1% (  -9% -   10%) 0.945
               TermDayOfYearSort      358.78      (1.1%)      359.21      (1.0%)    0.1% (  -1% -    2%) 0.719
                         TermB1M      593.44      (6.4%)      594.37      (7.0%)    0.2% ( -12% -   14%) 0.941
          CountFilteredOrHighMed       29.96      (2.0%)       30.00      (1.6%)    0.2% (  -3% -    3%) 0.771
         CountFilteredOrHighHigh       25.46      (1.9%)       25.51      (1.5%)    0.2% (  -3% -    3%) 0.731
                          Term1M      594.33      (6.3%)      595.69      (6.9%)    0.2% ( -12% -   14%) 0.913
                      OrHighRare      121.93      (3.7%)      122.22      (4.5%)    0.2% (  -7% -    8%) 0.857
                         Term100      593.84      (6.4%)      595.52      (6.7%)    0.3% ( -12% -   14%) 0.892
                         Term10K      594.26      (6.5%)      595.97      (6.9%)    0.3% ( -12% -   14%) 0.892
                            Term      594.63      (6.4%)      596.36      (6.9%)    0.3% ( -12% -   14%) 0.890
                       TermB1M1P      592.84      (6.4%)      595.12      (6.9%)    0.4% ( -12% -   14%) 0.855
                  FilteredPhrase       12.68      (2.0%)       12.74      (2.5%)    0.4% (  -3% -    5%) 0.531
               FilteredAnd3Terms      132.94      (3.8%)      133.61      (2.9%)    0.5% (  -5% -    7%) 0.636
               CombinedOrHighMed       28.73      (4.6%)       28.89      (5.9%)    0.6% (  -9% -   11%) 0.734
               FilteredOrHighMed       54.21      (5.0%)       54.52      (4.7%)    0.6% (  -8% -   10%) 0.711
                CountAndHighHigh       62.06      (1.2%)       62.42      (1.2%)    0.6% (  -1% -    3%) 0.142
              FilteredOrHighHigh       18.16      (3.3%)       18.27      (3.1%)    0.6% (  -5% -    7%) 0.566
                       CountTerm     8145.28      (5.9%)     8192.70      (4.8%)    0.6% (  -9% -   11%) 0.731
                FilteredOr3Terms       59.77      (4.6%)       60.13      (4.3%)    0.6% (  -7% -    9%) 0.662
                          Fuzzy1       51.29      (3.3%)       51.61      (3.6%)    0.6% (  -6% -    7%) 0.567
             CountFilteredOrMany        6.09      (1.6%)        6.13      (1.4%)    0.7% (  -2% -    3%) 0.153
                 AndHighOrMedMed       18.53      (2.9%)       18.67      (2.6%)    0.7% (  -4% -    6%) 0.392
      FilteredOr2Terms2StopWords       67.51      (6.9%)       68.02      (6.5%)    0.8% ( -11% -   15%) 0.720
                          Phrase        9.79      (2.0%)        9.87      (2.5%)    0.8% (  -3% -    5%) 0.278
                  FilteredOrMany        5.11      (2.7%)        5.15      (2.2%)    0.8% (  -3% -    5%) 0.308
                 CountOrHighHigh       63.91      (1.8%)       64.45      (2.2%)    0.8% (  -3% -    4%) 0.193
             FilteredOrStopWords       11.12      (2.5%)       11.21      (2.6%)    0.8% (  -4% -    6%) 0.298
                  CountOrHighMed       96.39      (3.3%)       97.23      (2.3%)    0.9% (  -4% -    6%) 0.335
             FilteredAndHighHigh       15.28      (3.3%)       15.42      (3.5%)    0.9% (  -5% -    7%) 0.408
            FilteredAndStopWords       12.04      (3.3%)       12.15      (3.8%)    0.9% (  -5% -    8%) 0.418
                   TermMonthSort     2994.17      (3.2%)     3026.80      (3.1%)    1.1% (  -5% -    7%) 0.277
              CombinedAndHighMed       29.21      (4.6%)       29.55      (5.4%)    1.2% (  -8% -   11%) 0.459
                     CountOrMany        6.42      (1.5%)        6.50      (1.9%)    1.2% (  -2% -    4%) 0.029
                 CountAndHighMed       93.40      (3.9%)       94.53      (2.1%)    1.2% (  -4% -    7%) 0.227
                 DismaxOrHighMed       66.64      (5.1%)       67.51      (6.0%)    1.3% (  -9% -   13%) 0.457
                    SloppyPhrase        1.44      (4.8%)        1.46      (3.5%)    1.4% (  -6% -   10%) 0.287
                DismaxOrHighHigh       46.64      (5.1%)       47.42      (6.4%)    1.7% (  -9% -   13%) 0.354
              CombinedOrHighHigh        7.18      (3.4%)        7.31      (4.8%)    1.8% (  -6% -   10%) 0.162
             CombinedAndHighHigh        7.29      (1.5%)        7.51      (2.2%)    3.0% (   0% -    6%) 0.000
                AndMedOrHighHigh       20.84      (5.3%)       21.59      (4.0%)    3.6% (  -5% -   13%) 0.015
              FilteredAndHighMed       44.67      (4.9%)       46.31      (4.4%)    3.7% (  -5% -   13%) 0.013
                       OrHighMed       90.74      (8.5%)       94.19     (10.5%)    3.8% ( -14% -   24%) 0.208
     FilteredAnd2Terms2StopWords       76.59      (8.0%)       80.57      (7.9%)    5.2% (  -9% -   22%) 0.039
              Or2Terms2StopWords       77.93      (8.6%)       82.68     (10.3%)    6.1% ( -11% -   27%) 0.042
                     OrStopWords       10.37      (7.9%)       11.02     (10.1%)    6.3% ( -10% -   26%) 0.029
             And2Terms2StopWords       75.13     (11.2%)       80.21     (11.0%)    6.8% ( -13% -   32%) 0.053
                      OrHighHigh       25.96      (9.1%)       27.94     (12.1%)    7.6% ( -12% -   31%) 0.025
                     AndHighHigh       26.53     (13.5%)       28.56     (12.5%)    7.6% ( -16% -   38%) 0.062
                      AndHighMed       67.66     (12.1%)       73.09     (10.9%)    8.0% ( -13% -   35%) 0.027
                          OrMany        5.81      (5.0%)        6.34      (5.4%)    9.0% (  -1% -   20%) 0.000
                        Or3Terms       81.64      (6.7%)       89.03      (9.0%)    9.0% (  -6% -   26%) 0.000
                       And3Terms       88.70      (9.5%)       97.50      (8.6%)    9.9% (  -7% -   30%) 0.001
                    AndStopWords        9.61      (9.3%)       10.68      (9.3%)   11.1% (  -6% -   32%) 0.000

And it shows an similar speedup !

@HUSTERGS HUSTERGS changed the title [WIP] Vectorize filterCompetitiveHits Vectorize filterCompetitiveHits Jul 4, 2025
Copy link

github-actions bot commented Jul 4, 2025

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@ChrisHegarty
Copy link
Contributor

ChrisHegarty commented Jul 4, 2025

Wow, this is a big speedup! I'd like to get opinions on INT_FOR_DOUBLE_SPECIES from folks who are familiar with the vector API, maybe @uschindler or @ChrisHegarty. I know we've had to be careful with the vector API at times as playing some tricks may get even slower than scalar code on some hardware.

This looks good to me. The usage of the vector API, and the structuring of the code and test look good. Let's add a micro benchmark to lucene/benchmark-jmh (though the in-place modification of the arrays complicates writing a micro benchmark! :-( ).

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left very minor comments, otherwise it looks good to me.

Copy link

github-actions bot commented Jul 5, 2025

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@HUSTERGS
Copy link
Contributor Author

HUSTERGS commented Jul 5, 2025

Let's add a micro benchmark to lucene/benchmark-jmh (though the in-place modification of the arrays complicates writing a micro benchmark! :-( ).

I've added a jmh, here is the result on my machine:

Benchmark                       (size)   Mode  Cnt      Score      Error   Units
CompetitiveBenchmark.baseline      128  thrpt    5   2024.463 ±  152.721  ops/ms
CompetitiveBenchmark.candidate     128  thrpt    5  15732.680 ± 1064.968  ops/ms

@github-actions github-actions bot added this to the 10.3.0 milestone Jul 5, 2025
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left very minor comments, otherwise LGTM.

@HUSTERGS
Copy link
Contributor Author

HUSTERGS commented Jul 5, 2025

After the newest commit, the benchmark results are as follows (I added a case where minScoreInclusive=0 out of curiosity, although we will never get a input equals zero, at least for now)

Benchmark                       (minScoreInclusive)  (size)   Mode  Cnt      Score      Error   Units
CompetitiveBenchmark.baseline                     0     128  thrpt    5  12959.008 ± 1350.676  ops/ms
CompetitiveBenchmark.baseline                   0.2     128  thrpt    5   2507.018 ±  149.741  ops/ms
CompetitiveBenchmark.baseline                   0.4     128  thrpt    5   1506.977 ±   28.579  ops/ms
CompetitiveBenchmark.baseline                   0.5     128  thrpt    5   1435.951 ±   51.081  ops/ms
CompetitiveBenchmark.baseline                   0.8     128  thrpt    5   2672.585 ±   26.704  ops/ms
CompetitiveBenchmark.candidate                    0     128  thrpt    5  16484.722 ±  563.235  ops/ms
CompetitiveBenchmark.candidate                  0.2     128  thrpt    5  16277.810 ±  390.714  ops/ms
CompetitiveBenchmark.candidate                  0.4     128  thrpt    5  15902.733 ± 1703.628  ops/ms
CompetitiveBenchmark.candidate                  0.5     128  thrpt    5  15823.521 ± 2488.243  ops/ms
CompetitiveBenchmark.candidate                  0.8     128  thrpt    5  15964.216 ± 1898.512  ops/ms

@uschindler
Copy link
Contributor

Hi, I am out of house this weekend

In general code and the usual VectorUtil abstraction looks fine; about the risks:

  • we should have benchmarks also on AMD ryzen/threadripper CPUs as well as ARM.
  • we should be careful when to enable the opti based on the various constants set based on hotspot flags enabled (which is a way to detect CPUid flags)
  • @rmuir should have a look, too. He knows all CPU types and risks better than all of us. 🤪

@uschindler uschindler requested a review from rmuir July 5, 2025 08:06
@jpountz
Copy link
Contributor

jpountz commented Jul 5, 2025

we should have benchmarks also on AMD ryzen/threadripper CPUs as well as ARM.

I get the following results on an AMD Ryzen 9 3900X (AVX2 support, but no AVX-512 support):

Benchmark                       (minScoreInclusive)  (size)   Mode  Cnt      Score     Error   Units
CompetitiveBenchmark.baseline                     0     128  thrpt    5  17146.485 ± 371.713  ops/ms
CompetitiveBenchmark.baseline                   0.2     128  thrpt    5   3146.653 ± 214.168  ops/ms
CompetitiveBenchmark.baseline                   0.4     128  thrpt    5   1967.109 ±  42.828  ops/ms
CompetitiveBenchmark.baseline                   0.5     128  thrpt    5   1837.372 ±   7.027  ops/ms
CompetitiveBenchmark.baseline                   0.8     128  thrpt    5   3522.881 ±  28.854  ops/ms
CompetitiveBenchmark.candidate                    0     128  thrpt    5   9759.642 ± 492.437  ops/ms
CompetitiveBenchmark.candidate                  0.2     128  thrpt    5   9887.365 ± 586.618  ops/ms
CompetitiveBenchmark.candidate                  0.4     128  thrpt    5   9867.487 ± 223.382  ops/ms
CompetitiveBenchmark.candidate                  0.5     128  thrpt    5   9855.217 ±  26.162  ops/ms
CompetitiveBenchmark.candidate                  0.8     128  thrpt    5   9872.613 ± 134.375  ops/ms

@HUSTERGS
Copy link
Contributor Author

HUSTERGS commented Jul 6, 2025

If the micro benchmark is right and the branchless impl performs better than the current impl (we should compare with luceneutil) then we should look into merging this change first, and then see if / how much explicit vectorization can further help.

Agreed, I'm running the luceneutil on my machine based on a temporary branch to verify the branchless way, will share the result once it finishs iterations

@HUSTERGS
Copy link
Contributor Author

HUSTERGS commented Jul 6, 2025

Here is the result comparing the branchless way (candidate) vs main branch (baseline) under identical setup:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
      FilteredOr2Terms2StopWords       67.28      (6.6%)       65.96      (6.8%)   -2.0% ( -14% -   12%) 0.355
                       OrHighMed       92.82      (3.8%)       91.23      (4.1%)   -1.7% (  -9% -    6%) 0.171
               CombinedOrHighMed       28.37      (6.3%)       27.89      (5.8%)   -1.7% ( -13% -   11%) 0.376
              CombinedAndHighMed       29.11      (5.4%)       28.61      (4.5%)   -1.7% ( -11% -    8%) 0.280
                 AndHighOrMedMed       18.44      (3.3%)       18.14      (2.8%)   -1.6% (  -7% -    4%) 0.089
                 DismaxOrHighMed       67.66      (3.7%)       66.60      (3.9%)   -1.6% (  -8% -    6%) 0.194
                         TermB1M      595.71      (3.6%)      586.50      (4.9%)   -1.5% (  -9% -    7%) 0.257
                          Term1M      596.06      (3.6%)      586.93      (5.0%)   -1.5% (  -9% -    7%) 0.266
                            Term      595.68      (3.6%)      586.74      (5.0%)   -1.5% (  -9% -    7%) 0.274
                         Term10K      594.97      (3.8%)      586.18      (4.9%)   -1.5% (  -9% -    7%) 0.283
                       TermB1M1P      595.60      (3.7%)      586.84      (4.9%)   -1.5% (  -9% -    7%) 0.286
                         Term100      595.04      (3.6%)      586.49      (5.0%)   -1.4% (  -9% -    7%) 0.294
                       CountTerm     7970.70      (5.6%)     7856.82      (5.6%)   -1.4% ( -12% -   10%) 0.422
                    FilteredTerm       87.37      (4.3%)       86.13      (4.6%)   -1.4% (  -9% -    7%) 0.314
               FilteredOrHighMed       54.11      (5.0%)       53.35      (5.3%)   -1.4% ( -11% -    9%) 0.388
                      OrHighRare      121.49      (3.2%)      119.92      (3.4%)   -1.3% (  -7% -    5%) 0.220
                 CountOrHighHigh       64.26      (2.6%)       63.43      (1.9%)   -1.3% (  -5% -    3%) 0.075
                DismaxOrHighHigh       47.55      (2.5%)       46.94      (2.9%)   -1.3% (  -6% -    4%) 0.135
            FilteredAndStopWords       12.03      (2.8%)       11.89      (3.7%)   -1.2% (  -7% -    5%) 0.270
                FilteredOr3Terms       59.60      (4.5%)       58.92      (4.7%)   -1.2% (  -9% -    8%) 0.426
                   TermTitleSort       73.24      (4.2%)       72.44      (4.2%)   -1.1% (  -9% -    7%) 0.410
                     CountOrMany        6.48      (2.3%)        6.41      (1.9%)   -1.1% (  -5% -    3%) 0.108
             CountFilteredOrMany        6.08      (1.9%)        6.02      (1.5%)   -1.0% (  -4% -    2%) 0.061
                  CountOrHighMed       97.07      (2.8%)       96.11      (2.6%)   -1.0% (  -6% -    4%) 0.250
                      TermDTSort      193.19      (2.7%)      191.42      (2.4%)   -0.9% (  -5% -    4%) 0.252
                   TermMonthSort     2982.24      (3.7%)     2958.62      (3.4%)   -0.8% (  -7% -    6%) 0.482
             FilteredAndHighHigh       15.23      (2.8%)       15.12      (3.6%)   -0.8% (  -7% -    5%) 0.446
                          Fuzzy1       50.54      (3.2%)       50.15      (3.7%)   -0.8% (  -7% -    6%) 0.473
              FilteredOrHighHigh       18.16      (3.2%)       18.02      (3.2%)   -0.8% (  -6% -    5%) 0.447
                 CountAndHighMed       94.00      (2.8%)       93.32      (2.8%)   -0.7% (  -6% -    5%) 0.416
                      DismaxTerm      657.56      (3.5%)      652.99      (3.8%)   -0.7% (  -7% -    6%) 0.547
                CountAndHighHigh       62.08      (1.8%)       61.67      (1.1%)   -0.7% (  -3% -    2%) 0.161
                          Fuzzy2       45.36      (1.8%)       45.11      (2.4%)   -0.6% (  -4% -    3%) 0.399
                  FilteredPhrase       12.66      (2.3%)       12.59      (2.3%)   -0.6% (  -5% -    4%) 0.433
                     AndHighHigh       28.01      (3.5%)       27.85      (4.2%)   -0.5% (  -8% -    7%) 0.658
                    CombinedTerm       14.62      (3.8%)       14.55      (4.9%)   -0.5% (  -8% -    8%) 0.705
                          Phrase        9.91      (2.1%)        9.86      (2.2%)   -0.5% (  -4% -    3%) 0.453
              Or2Terms2StopWords       79.00      (7.2%)       78.61      (7.4%)   -0.5% ( -14% -   15%) 0.829
                IntervalsOrdered        2.96      (2.4%)        2.94      (2.6%)   -0.5% (  -5% -    4%) 0.564
               FilteredAnd3Terms      132.86      (3.0%)      132.29      (3.4%)   -0.4% (  -6% -    6%) 0.670
             FilteredOrStopWords       11.15      (2.2%)       11.11      (2.4%)   -0.4% (  -4% -    4%) 0.591
             And2Terms2StopWords       76.63      (8.2%)       76.34      (8.3%)   -0.4% ( -15% -   17%) 0.882
         CountFilteredOrHighHigh       25.24      (1.3%)       25.16      (1.3%)   -0.3% (  -2% -    2%) 0.451
          CountFilteredOrHighMed       29.64      (1.4%)       29.56      (1.3%)   -0.2% (  -2% -    2%) 0.570
                          IntNRQ       48.50      (2.2%)       48.38      (2.3%)   -0.2% (  -4% -    4%) 0.741
                  FilteredIntNRQ       48.16      (2.2%)       48.10      (2.3%)   -0.1% (  -4% -    4%) 0.868
             CountFilteredIntNRQ       22.17      (1.3%)       22.14      (1.8%)   -0.1% (  -3% -    3%) 0.829
                  FilteredOrMany        5.11      (3.0%)        5.11      (2.7%)    0.0% (  -5% -    5%) 0.993
     FilteredAnd2Terms2StopWords       77.42      (6.3%)       77.46      (6.5%)    0.0% ( -12% -   13%) 0.984
                    SloppyPhrase        1.47      (4.1%)        1.47      (4.6%)    0.1% (  -8% -    9%) 0.928
                AndMedOrHighHigh       21.15      (1.9%)       21.19      (1.8%)    0.2% (  -3% -    3%) 0.719
                          IntSet      401.08      (4.4%)      402.07      (3.9%)    0.2% (  -7% -    8%) 0.851
               TermDayOfYearSort      356.48      (1.1%)      357.37      (1.2%)    0.2% (  -2% -    2%) 0.490
                        SpanNear        3.07      (4.9%)        3.08      (4.9%)    0.3% (  -9% -   10%) 0.858
                      AndHighMed       70.36      (3.1%)       70.60      (3.0%)    0.3% (  -5% -    6%) 0.722
                         Respell       43.66      (2.9%)       43.81      (2.3%)    0.3% (  -4% -    5%) 0.674
              FilteredAndHighMed       45.14      (2.9%)       45.33      (3.3%)    0.4% (  -5% -    6%) 0.671
             CountFilteredPhrase       11.53      (2.0%)       11.59      (2.2%)    0.5% (  -3% -    4%) 0.435
                 FilteredPrefix3       93.08      (3.5%)       93.69      (2.6%)    0.7% (  -5% -    6%) 0.494
              CombinedOrHighHigh        7.09      (5.4%)        7.14      (5.8%)    0.7% (  -9% -   12%) 0.700
                      OrHighHigh       26.79      (2.4%)       26.97      (4.2%)    0.7% (  -5% -    7%) 0.530
                         Prefix3       99.61      (3.8%)      100.33      (2.5%)    0.7% (  -5% -    7%) 0.483
             CombinedAndHighHigh        7.30      (1.8%)        7.36      (1.9%)    0.8% (  -2% -    4%) 0.161
                        Wildcard       58.26      (3.3%)       58.86      (2.9%)    1.0% (  -4% -    7%) 0.287
                     OrStopWords       10.76      (5.4%)       10.88      (7.2%)    1.1% ( -10% -   14%) 0.574
                     CountPhrase        3.30      (3.6%)        3.34      (2.1%)    1.4% (  -4% -    7%) 0.131
                          OrMany        5.87      (3.0%)        6.03      (4.3%)    2.7% (  -4% -   10%) 0.019
                        Or3Terms       83.12      (2.8%)       85.65      (4.4%)    3.0% (  -4% -   10%) 0.009
                       And3Terms       91.23      (3.3%)       94.32      (4.2%)    3.4% (  -3% -   11%) 0.004
                    AndStopWords       10.01      (3.7%)       10.48      (6.1%)    4.7% (  -4% -   15%) 0.003

I think this change shows a good enough speedup, will ran another luceneutil comparing the explict vectorizing and branchless way.

BTW, should I keep that part of vectorize code or just keep the branchless way if we are about to merge this?

( IMHO, It might be beneficial if we can figure out a way to enable those complex vectorized operations (of couse, not in this PR), without slowing down on machines that don’t support the underlying instructions (or where they are not enabled in the JVM), because there may be other places where we could benefit from vectorization ? )

@rmuir
Copy link
Member

rmuir commented Jul 6, 2025

( IMHO, It might be beneficial if we can figure out a way to enable those complex vectorized operations (of couse, not in this PR), without slowing down on machines that don’t support the underlying instructions (or where they are not enabled in the JVM), because there may be other places where we could benefit from vectorization ? )

We can try to improve it and make more checks available, but the developer has to remember to use them. e.g. we have HAS_FAST_INTEGER_VECTORS but nobody ever remembers to use it. I will clean that one up in a separate PR.

In cases like this one, I'm afraid it gets no simpler: sve must be checked for on the arm and avx-512 likely should also be avoided too. avx-512 is a mess of smaller instruction sets and maybe doing this compress requires VBMI or VBMI2 or something (I haven't looked yet, this is just an example). So better to use 256-bit vectors only since openjdk has a basic avx2 algorithm. https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#AVX-512_CPU_compatibility_table

@rmuir
Copy link
Member

rmuir commented Jul 6, 2025

honestly, to make it reasonable and prevent traps like this, a good approach would be to support less stuff: e.g. only support 256-bit SVE on ARM and 256-bit AVX2 on x86.

by ditching NEON and AVX-512, it would be simpler for us to maintain and reduce traps. Of course I imagine this would be controversial, I'm just discussing ways we can make our lives easier.

at least we can start by ditching SSE2/AVX-128, I figure this shouldn't be controversial: #14901

@jpountz
Copy link
Contributor

jpountz commented Jul 6, 2025

Thanks for running macro benchmarks with the branchless approach!

BTW, should I keep that part of vectorize code or just keep the branchless way if we are about to merge this?

I was thinking of creating a new PR with the branchless approach, merging it, then rebasing this PR and continuing discussions wrt how/when to use explicit vectorization depending on how performance compares with our new (branchless) baseline. Would you like to open a PR that switches main to the branchless impl?

@HUSTERGS
Copy link
Contributor Author

HUSTERGS commented Jul 7, 2025

honestly, to make it reasonable and prevent traps like this, a good approach would be to support less stuff: e.g. only support 256-bit SVE on ARM and 256-bit AVX2 on x86.

Good idea i think, sometimes the preferred bitsize is still 256-bit even on a machine support AVX-512, and the 512-bit also have risk of thermal throttling.

Would you like to open a PR that switches main to the branchless impl?

Of course! I'v opened a new PR #14906

and then see if / how much explicit vectorization can further help.

I also ran a luceneutil with branchless as baseline and explict vectorize as candidate, with identical setup, here is the result if it helps:

result
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                       CountTerm     7866.13      (4.7%)     7645.58      (3.8%)   -2.8% ( -10% -    5%) 0.038
      FilteredOr2Terms2StopWords       65.25      (6.6%)       63.67      (4.7%)   -2.4% ( -12% -    9%) 0.180
                      OrHighRare      120.97      (7.1%)      118.20      (6.3%)   -2.3% ( -14% -   11%) 0.283
                    CombinedTerm       14.54      (3.5%)       14.24      (4.4%)   -2.0% (  -9% -    6%) 0.108
                       TermB1M1P      585.82      (5.3%)      574.47      (4.8%)   -1.9% ( -11% -    8%) 0.222
                      DismaxTerm      648.69      (4.1%)      636.40      (3.8%)   -1.9% (  -9% -    6%) 0.131
               FilteredOrHighMed       52.82      (5.1%)       51.83      (3.4%)   -1.9% (  -9% -    7%) 0.175
                         Term10K      584.69      (5.4%)      574.22      (4.8%)   -1.8% ( -11% -    8%) 0.271
                          Term1M      585.51      (5.3%)      575.04      (5.0%)   -1.8% ( -11% -    8%) 0.272
                            Term      585.35      (5.2%)      575.06      (4.9%)   -1.8% ( -11% -    8%) 0.273
                         Term100      585.37      (5.3%)      575.21      (4.9%)   -1.7% ( -11% -    8%) 0.285
                         TermB1M      585.01      (5.2%)      574.89      (4.9%)   -1.7% ( -11% -    8%) 0.276
                    FilteredTerm       85.08      (4.6%)       83.72      (3.7%)   -1.6% (  -9% -    7%) 0.227
                          Fuzzy1       50.46      (3.7%)       49.68      (2.9%)   -1.5% (  -7% -    5%) 0.143
             And2Terms2StopWords       76.00      (9.0%)       74.86      (7.2%)   -1.5% ( -16% -   16%) 0.561
                   TermMonthSort     2924.72      (3.3%)     2885.32      (3.1%)   -1.3% (  -7% -    5%) 0.188
                FilteredOr3Terms       58.33      (4.6%)       57.56      (2.9%)   -1.3% (  -8% -    6%) 0.273
               CombinedOrHighMed       27.77      (5.0%)       27.40      (6.2%)   -1.3% ( -11% -   10%) 0.457
              CombinedAndHighMed       28.30      (5.2%)       27.95      (4.9%)   -1.3% ( -10% -    9%) 0.426
              FilteredOrHighHigh       17.87      (3.6%)       17.66      (2.3%)   -1.2% (  -6% -    4%) 0.202
     FilteredAnd2Terms2StopWords       77.04      (6.6%)       76.25      (5.4%)   -1.0% ( -12% -   11%) 0.590
               FilteredAnd3Terms      132.17      (2.8%)      130.83      (1.7%)   -1.0% (  -5% -    3%) 0.166
                          Fuzzy2       45.46      (2.6%)       45.07      (2.0%)   -0.9% (  -5% -    3%) 0.234
             FilteredAndHighHigh       14.97      (2.4%)       14.84      (3.2%)   -0.9% (  -6% -    4%) 0.326
            FilteredAndStopWords       11.75      (2.6%)       11.65      (3.4%)   -0.9% (  -6% -    5%) 0.374
                   TermTitleSort       71.29      (5.1%)       70.68      (3.4%)   -0.9% (  -8% -    8%) 0.534
                         Prefix3      100.81      (3.0%)      100.06      (2.9%)   -0.7% (  -6% -    5%) 0.428
             FilteredOrStopWords       11.02      (2.9%)       10.94      (2.1%)   -0.7% (  -5% -    4%) 0.374
                    SloppyPhrase        1.45      (4.4%)        1.44      (5.1%)   -0.7% (  -9% -    9%) 0.655
                  FilteredPhrase       12.59      (1.8%)       12.51      (2.1%)   -0.7% (  -4% -    3%) 0.272
                 FilteredPrefix3       94.17      (2.9%)       93.55      (2.7%)   -0.7% (  -6% -    5%) 0.460
                          Phrase        9.84      (2.7%)        9.78      (3.8%)   -0.6% (  -6% -    6%) 0.556
              Or2Terms2StopWords       78.52      (8.2%)       78.09      (6.5%)   -0.5% ( -14% -   15%) 0.815
                  CountOrHighMed       96.16      (2.8%)       95.66      (2.2%)   -0.5% (  -5% -    4%) 0.514
                      TermDTSort      192.18      (2.6%)      191.26      (2.0%)   -0.5% (  -4% -    4%) 0.516
                     CountPhrase        3.26      (5.5%)        3.25      (4.5%)   -0.4% (  -9% -   10%) 0.802
                          IntNRQ       49.15      (1.4%)       48.96      (1.8%)   -0.4% (  -3% -    2%) 0.458
                 CountAndHighMed       93.17      (3.1%)       92.86      (2.6%)   -0.3% (  -5% -    5%) 0.716
                IntervalsOrdered        2.95      (3.1%)        2.94      (3.8%)   -0.3% (  -7% -    6%) 0.770
                  FilteredOrMany        5.08      (2.6%)        5.07      (1.5%)   -0.3% (  -4% -    3%) 0.633
                 DismaxOrHighMed       66.40      (4.1%)       66.21      (2.9%)   -0.3% (  -7% -    7%) 0.798
                 CountOrHighHigh       64.28      (2.2%)       64.11      (2.1%)   -0.3% (  -4% -    4%) 0.700
              FilteredAndHighMed       45.07      (3.0%)       44.96      (3.0%)   -0.3% (  -6% -    5%) 0.789
                  FilteredIntNRQ       48.82      (1.4%)       48.76      (2.0%)   -0.1% (  -3% -    3%) 0.799
                        SpanNear        3.08      (3.9%)        3.08      (4.5%)    0.0% (  -8% -    8%) 0.985
                CountAndHighHigh       62.29      (1.5%)       62.33      (1.4%)    0.1% (  -2% -    3%) 0.869
          CountFilteredOrHighMed       29.65      (1.7%)       29.70      (1.7%)    0.2% (  -3% -    3%) 0.732
         CountFilteredOrHighHigh       25.21      (1.4%)       25.27      (1.6%)    0.2% (  -2% -    3%) 0.633
                     CountOrMany        6.49      (1.9%)        6.50      (2.5%)    0.2% (  -4% -    4%) 0.735
                         Respell       44.51      (2.0%)       44.62      (2.0%)    0.3% (  -3% -    4%) 0.683
             CountFilteredOrMany        6.09      (1.6%)        6.11      (2.0%)    0.3% (  -3% -    3%) 0.612
                          IntSet      405.88      (4.3%)      407.55      (4.7%)    0.4% (  -8% -    9%) 0.773
                 AndHighOrMedMed       18.11      (2.9%)       18.18      (2.9%)    0.4% (  -5% -    6%) 0.640
              CombinedOrHighHigh        7.13      (4.0%)        7.17      (6.0%)    0.5% (  -9% -   10%) 0.771
                DismaxOrHighHigh       47.02      (3.7%)       47.26      (3.1%)    0.5% (  -6% -    7%) 0.633
                        Wildcard       59.04      (3.5%)       59.35      (3.3%)    0.5% (  -6% -    7%) 0.623
             CountFilteredIntNRQ       22.32      (1.7%)       22.46      (1.9%)    0.6% (  -2% -    4%) 0.290
               TermDayOfYearSort      358.17      (0.8%)      360.43      (1.2%)    0.6% (  -1% -    2%) 0.052
             CountFilteredPhrase       11.64      (2.0%)       11.71      (1.5%)    0.7% (  -2% -    4%) 0.246
             CombinedAndHighHigh        7.28      (2.8%)        7.35      (3.1%)    1.0% (  -4% -    7%) 0.286
                AndMedOrHighHigh       21.26      (2.3%)       21.48      (2.2%)    1.0% (  -3% -    5%) 0.148
                     OrStopWords       11.13      (5.3%)       11.27      (6.1%)    1.3% (  -9% -   13%) 0.476
                       And3Terms       94.41      (4.5%)       95.72      (3.5%)    1.4% (  -6% -    9%) 0.281
                    AndStopWords       10.63      (4.4%)       10.79      (5.0%)    1.6% (  -7% -   11%) 0.287
                       OrHighMed       91.11      (4.8%)       92.89      (3.7%)    2.0% (  -6% -   10%) 0.145
                        Or3Terms       86.02      (4.3%)       87.73      (3.4%)    2.0% (  -5% -   10%) 0.103
                          OrMany        6.09      (4.9%)        6.25      (4.3%)    2.5% (  -6% -   12%) 0.081
                      AndHighMed       70.55      (3.9%)       72.50      (3.1%)    2.8% (  -4% -   10%) 0.013
                     AndHighHigh       28.24      (4.1%)       29.23      (4.1%)    3.5% (  -4% -   12%) 0.007
                      OrHighHigh       27.26      (4.3%)       28.43      (3.4%)    4.3% (  -3% -   12%) 0.000

@HUSTERGS
Copy link
Contributor Author

HUSTERGS commented Jul 7, 2025

I want to share the luceneutil result against the latest code:

result
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                     OrStopWords        8.38      (9.8%)        8.15      (4.8%)   -2.7% ( -15% -   13%) 0.272
                    CombinedTerm       11.24      (3.9%)       11.05      (5.1%)   -1.7% ( -10% -    7%) 0.235
                 FilteredPrefix3       71.86      (2.7%)       71.52      (2.9%)   -0.5% (  -5% -    5%) 0.594
                         Prefix3       76.67      (2.9%)       76.31      (2.9%)   -0.5% (  -6% -    5%) 0.614
                        SpanNear        2.52      (3.9%)        2.52      (3.3%)   -0.3% (  -7% -    7%) 0.817
                        Wildcard       47.88      (2.7%)       47.79      (3.4%)   -0.2% (  -6% -    6%) 0.847
                          Phrase        7.76      (2.8%)        7.75      (3.1%)   -0.1% (  -5% -    5%) 0.881
                   TermMonthSort     2436.04      (2.6%)     2438.89      (3.3%)    0.1% (  -5% -    6%) 0.901
                       CountTerm     7406.29      (3.5%)     7416.13      (3.7%)    0.1% (  -6% -    7%) 0.907
                    SloppyPhrase        1.14      (5.4%)        1.14      (5.8%)    0.2% ( -10% -   12%) 0.918
                      OrHighRare       95.84      (5.4%)       96.05      (4.9%)    0.2% (  -9% -   11%) 0.891
          CountFilteredOrHighMed       17.98      (0.5%)       18.03      (0.6%)    0.2% (   0% -    1%) 0.158
                          IntSet      296.49      (4.0%)      297.26      (4.3%)    0.3% (  -7% -    8%) 0.845
             CountFilteredIntNRQ       16.36      (1.0%)       16.42      (1.0%)    0.3% (  -1% -    2%) 0.326
         CountFilteredOrHighHigh       15.89      (0.6%)       15.95      (0.7%)    0.4% (   0% -    1%) 0.065
                IntervalsOrdered        2.43      (3.6%)        2.44      (3.3%)    0.4% (  -6% -    7%) 0.696
                  FilteredPhrase        9.98      (1.9%)       10.03      (2.0%)    0.5% (  -3% -    4%) 0.452
                     CountPhrase        2.66      (4.8%)        2.67      (4.5%)    0.5% (  -8% -   10%) 0.743
                CountAndHighHigh       48.95      (1.7%)       49.26      (2.2%)    0.6% (  -3% -    4%) 0.300
                   TermTitleSort       51.66      (5.4%)       52.01      (5.9%)    0.7% ( -10% -   12%) 0.697
             CountFilteredPhrase        9.07      (2.4%)        9.14      (3.6%)    0.7% (  -5% -    6%) 0.447
            FilteredAndStopWords        8.43      (1.9%)        8.50      (2.5%)    0.8% (  -3% -    5%) 0.275
               TermDayOfYearSort      271.55      (1.9%)      273.63      (2.5%)    0.8% (  -3% -    5%) 0.277
                 CountOrHighHigh       50.37      (2.0%)       50.78      (2.5%)    0.8% (  -3% -    5%) 0.264
                    FilteredTerm       66.33      (2.9%)       66.88      (2.8%)    0.8% (  -4% -    6%) 0.356
                  CountOrHighMed       77.72      (2.0%)       78.38      (2.1%)    0.9% (  -3% -    4%) 0.182
             FilteredAndHighHigh       10.41      (1.9%)       10.51      (2.5%)    0.9% (  -3% -    5%) 0.212
                 CountAndHighMed       75.17      (2.4%)       75.84      (2.5%)    0.9% (  -3% -    5%) 0.247
              CombinedOrHighHigh        5.53      (3.3%)        5.58      (4.0%)    0.9% (  -6% -    8%) 0.442
                          IntNRQ       42.23      (2.7%)       42.62      (2.8%)    0.9% (  -4% -    6%) 0.288
                     CountOrMany        5.03      (2.3%)        5.08      (2.9%)    0.9% (  -4% -    6%) 0.261
                  FilteredIntNRQ       41.95      (2.7%)       42.35      (2.7%)    0.9% (  -4% -    6%) 0.273
             FilteredOrStopWords        8.26      (1.6%)        8.34      (1.6%)    1.0% (  -2% -    4%) 0.047
              FilteredOrHighHigh       13.22      (1.5%)       13.36      (1.7%)    1.1% (  -2% -    4%) 0.034
             CountFilteredOrMany        4.46      (1.9%)        4.51      (2.5%)    1.1% (  -3% -    5%) 0.120
                          Fuzzy1       40.56      (3.8%)       41.03      (3.5%)    1.2% (  -5% -    8%) 0.319
                FilteredOr3Terms       44.37      (2.4%)       44.91      (2.6%)    1.2% (  -3% -    6%) 0.121
                          Fuzzy2       36.81      (3.3%)       37.27      (3.5%)    1.2% (  -5% -    8%) 0.247
               FilteredOrHighMed       39.33      (2.3%)       39.85      (2.9%)    1.3% (  -3% -    6%) 0.106
                  FilteredOrMany        4.03      (2.8%)        4.09      (2.3%)    1.3% (  -3% -    6%) 0.098
                      TermDTSort      143.16      (3.4%)      145.47      (4.6%)    1.6% (  -6% -    9%) 0.208
               CombinedOrHighMed       20.80      (4.0%)       21.13      (5.2%)    1.6% (  -7% -   11%) 0.267
      FilteredOr2Terms2StopWords       50.41      (3.0%)       51.22      (3.7%)    1.6% (  -4% -    8%) 0.128
             CombinedAndHighHigh        5.67      (1.9%)        5.77      (2.3%)    1.7% (  -2% -    5%) 0.010
                DismaxOrHighHigh       34.94      (5.7%)       35.56      (2.8%)    1.8% (  -6% -   10%) 0.213
                         Respell       36.11      (3.2%)       36.75      (3.5%)    1.8% (  -4% -    8%) 0.092
                      DismaxTerm      523.84      (5.8%)      533.26      (2.9%)    1.8% (  -6% -   11%) 0.213
                AndMedOrHighHigh       16.66      (3.9%)       16.97      (2.2%)    1.9% (  -4% -    8%) 0.065
              FilteredAndHighMed       31.47      (3.2%)       32.12      (2.0%)    2.0% (  -3% -    7%) 0.014
                    AndStopWords        8.13      (8.8%)        8.30      (3.9%)    2.1% (  -9% -   16%) 0.329
               FilteredAnd3Terms      101.09      (3.0%)      103.25      (3.3%)    2.1% (  -4% -    8%) 0.033
              CombinedAndHighMed       21.25      (3.1%)       21.71      (4.1%)    2.2% (  -4% -    9%) 0.058
                 AndHighOrMedMed       14.11      (1.6%)       14.43      (2.3%)    2.3% (  -1% -    6%) 0.000
                            Term      475.06      (8.0%)      486.72      (4.2%)    2.5% (  -8% -   15%) 0.224
                          Term1M      474.61      (7.9%)      486.87      (4.1%)    2.6% (  -8% -   15%) 0.195
                         Term100      474.49      (7.9%)      486.82      (4.1%)    2.6% (  -8% -   15%) 0.194
                         Term10K      474.11      (7.9%)      486.63      (4.1%)    2.6% (  -8% -   15%) 0.185
                         TermB1M      474.30      (7.9%)      486.95      (4.2%)    2.7% (  -8% -   16%) 0.182
                       TermB1M1P      474.36      (7.8%)      487.12      (4.0%)    2.7% (  -8% -   15%) 0.171
                 DismaxOrHighMed       50.32      (6.0%)       51.78      (3.2%)    2.9% (  -5% -   12%) 0.057
                     AndHighHigh       20.71     (10.4%)       21.43      (3.6%)    3.5% (  -9% -   19%) 0.156
     FilteredAnd2Terms2StopWords       58.48      (5.0%)       60.57      (4.5%)    3.6% (  -5% -   13%) 0.017
                        Or3Terms       63.26      (8.8%)       65.91      (3.7%)    4.2% (  -7% -   18%) 0.048
                       OrHighMed       67.43      (9.6%)       70.50      (3.5%)    4.6% (  -7% -   19%) 0.046
                      OrHighHigh       20.16     (10.3%)       21.10      (2.6%)    4.7% (  -7% -   19%) 0.048
              Or2Terms2StopWords       59.67      (7.4%)       62.50      (6.2%)    4.7% (  -8% -   19%) 0.028
                       And3Terms       69.96      (7.7%)       73.31      (3.4%)    4.8% (  -5% -   17%) 0.011
             And2Terms2StopWords       57.56      (7.4%)       60.42      (6.4%)    5.0% (  -8% -   20%) 0.023
                      AndHighMed       52.50      (9.1%)       55.25      (2.5%)    5.2% (  -5% -   18%) 0.013
                          OrMany        4.54      (5.2%)        4.79      (5.0%)    5.4% (  -4% -   16%) 0.001

There’s still about a 5% margin for improvement.
(I got some problem with previous machine, so there is a overall drop in absolute performance, but the relative percentage differences are still meaningful )

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for running benchmarks again. 5% is not a small speedup, we should definitely keep looking into this!

I left some minor comments, but to me the biggest question left is how the optimization should be disabled on architectures that do not support compress() efficiently such as Neon. The current check based on the number of lanes feels fragile. @rmuir may have ideas.

I ran the microbenchmark on my AMD Ryzen 3900X, it suggests similar performance as the branchless impl:

Benchmark                                     (minScoreInclusive)  (size)   Mode  Cnt      Score     Error   Units
CompetitiveBenchmark.baseline                                   0     128  thrpt    5  16964.805 ± 811.642  ops/ms
CompetitiveBenchmark.baseline                                 0.2     128  thrpt    5   3349.967 ± 374.600  ops/ms
CompetitiveBenchmark.baseline                                 0.4     128  thrpt    5   1945.372 ±  41.630  ops/ms
CompetitiveBenchmark.baseline                                 0.5     128  thrpt    5   1849.745 ± 114.880  ops/ms
CompetitiveBenchmark.baseline                                 0.8     128  thrpt    5   3520.931 ± 142.512  ops/ms
CompetitiveBenchmark.branchlessCandidate                        0     128  thrpt    5  17189.332 ± 935.697  ops/ms
CompetitiveBenchmark.branchlessCandidate                      0.2     128  thrpt    5   8868.229 ± 315.512  ops/ms
CompetitiveBenchmark.branchlessCandidate                      0.4     128  thrpt    5   8844.621 ± 213.875  ops/ms
CompetitiveBenchmark.branchlessCandidate                      0.5     128  thrpt    5   8788.696 ± 305.367  ops/ms
CompetitiveBenchmark.branchlessCandidate                      0.8     128  thrpt    5   8783.131 ± 639.710  ops/ms
CompetitiveBenchmark.branchlessCandidateCmov                    0     128  thrpt    5  17920.213 ± 629.212  ops/ms
CompetitiveBenchmark.branchlessCandidateCmov                  0.2     128  thrpt    5  10342.089 ± 343.342  ops/ms
CompetitiveBenchmark.branchlessCandidateCmov                  0.4     128  thrpt    5  10493.824 ± 638.798  ops/ms
CompetitiveBenchmark.branchlessCandidateCmov                  0.5     128  thrpt    5  10339.155 ± 314.769  ops/ms
CompetitiveBenchmark.branchlessCandidateCmov                  0.8     128  thrpt    5  10461.349 ± 399.232  ops/ms
CompetitiveBenchmark.vectorizedCandidate                        0     128  thrpt    5  10103.773 ± 193.359  ops/ms
CompetitiveBenchmark.vectorizedCandidate                      0.2     128  thrpt    5  10136.886 ± 311.288  ops/ms
CompetitiveBenchmark.vectorizedCandidate                      0.4     128  thrpt    5  10236.376 ± 255.335  ops/ms
CompetitiveBenchmark.vectorizedCandidate                      0.5     128  thrpt    5  10100.938 ± 196.165  ops/ms
CompetitiveBenchmark.vectorizedCandidate                      0.8     128  thrpt    5  10085.442 ± 358.695  ops/ms

However, luceneutil seems to see a speedup on wikibigall, so it's promising that it doesn't only help with AVX-512.

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                   TermTitleSort       87.00      (4.2%)       85.96      (5.6%)   -1.2% ( -10% -    9%) 0.447
                 AndHighOrMedMed       50.93      (2.1%)       50.74      (2.5%)   -0.4% (  -4% -    4%) 0.610
        FilteredDismaxOrHighHigh       70.96      (2.6%)       70.73      (2.7%)   -0.3% (  -5% -    5%) 0.697
               TermDayOfYearSort      286.00      (1.6%)      285.38      (1.3%)   -0.2% (  -3% -    2%) 0.630
             CountFilteredOrMany       27.14      (1.3%)       27.09      (1.9%)   -0.2% (  -3% -    3%) 0.728
               FilteredOrHighMed      154.55      (1.0%)      154.35      (1.0%)   -0.1% (  -2% -    1%) 0.672
         FilteredDismaxOrHighMed      131.33      (2.1%)      131.17      (2.0%)   -0.1% (  -4% -    4%) 0.848
                 CountAndHighMed      309.76      (1.2%)      309.54      (1.7%)   -0.1% (  -2% -    2%) 0.876
                  FilteredIntNRQ      297.68      (1.0%)      297.50      (0.7%)   -0.1% (  -1% -    1%) 0.823
                   TermMonthSort     3565.51      (1.7%)     3564.45      (2.0%)   -0.0% (  -3% -    3%) 0.960
                FilteredOr3Terms      167.89      (0.9%)      167.88      (1.0%)   -0.0% (  -1% -    1%) 0.978
         CountFilteredOrHighHigh      137.15      (0.7%)      137.15      (0.9%)   -0.0% (  -1% -    1%) 0.984
          CountFilteredOrHighMed      149.10      (0.6%)      149.15      (0.6%)    0.0% (  -1% -    1%) 0.867
      FilteredOr2Terms2StopWords      148.16      (1.0%)      148.22      (1.0%)    0.0% (  -1% -    2%) 0.895
                CountAndHighHigh      357.92      (2.0%)      358.18      (2.4%)    0.1% (  -4% -    4%) 0.918
              CombinedAndHighMed       89.28      (1.3%)       89.35      (1.8%)    0.1% (  -2% -    3%) 0.879
                     CountOrMany       29.08      (1.4%)       29.11      (1.8%)    0.1% (  -3% -    3%) 0.853
                  FilteredPhrase       32.36      (1.5%)       32.39      (1.0%)    0.1% (  -2% -    2%) 0.790
                     CountPhrase        4.22      (1.7%)        4.23      (1.5%)    0.2% (  -2% -    3%) 0.732
             FilteredOrStopWords       45.94      (1.9%)       46.03      (1.5%)    0.2% (  -3% -    3%) 0.716
               CombinedOrHighMed       88.00      (2.4%)       88.20      (1.9%)    0.2% (  -3% -    4%) 0.731
             CountFilteredPhrase       25.42      (1.8%)       25.48      (1.0%)    0.2% (  -2% -    3%) 0.601
                    CombinedTerm       38.45      (2.6%)       38.54      (2.6%)    0.2% (  -4% -    5%) 0.776
                 FilteredPrefix3      151.00      (1.5%)      151.51      (1.8%)    0.3% (  -3% -    3%) 0.528
                 CountOrHighHigh      341.83      (1.9%)      343.02      (2.4%)    0.3% (  -3% -    4%) 0.617
              FilteredOrHighHigh       67.43      (1.8%)       67.67      (1.6%)    0.3% (  -2% -    3%) 0.514
                  FilteredOrMany       16.53      (1.7%)       16.59      (2.1%)    0.4% (  -3% -    4%) 0.520
                  CountOrHighMed      361.72      (1.4%)      363.32      (2.3%)    0.4% (  -3% -    4%) 0.465
     FilteredAnd2Terms2StopWords      213.67      (1.3%)      215.21      (1.1%)    0.7% (  -1% -    3%) 0.058
             FilteredAndHighHigh       78.46      (1.9%)       79.03      (2.3%)    0.7% (  -3% -    5%) 0.280
                       CountTerm     9673.38      (2.7%)     9743.81      (2.5%)    0.7% (  -4% -    6%) 0.377
                 DismaxOrHighMed      188.33      (4.9%)      189.73      (5.1%)    0.7% (  -8% -   11%) 0.638
                        PKLookup      312.42      (5.1%)      314.78      (4.6%)    0.8% (  -8% -   11%) 0.624
                      TermDTSort      386.11      (2.0%)      389.12      (2.3%)    0.8% (  -3% -    5%) 0.257
            FilteredAndStopWords       64.83      (2.0%)       65.36      (2.4%)    0.8% (  -3% -    5%) 0.247
                       OrHighMed      251.13      (8.3%)      253.57      (8.1%)    1.0% ( -14% -   19%) 0.709
                      DismaxTerm      737.97      (7.0%)      745.84      (6.3%)    1.1% ( -11% -   15%) 0.612
               FilteredAnd3Terms      187.22      (2.1%)      189.33      (2.2%)    1.1% (  -3% -    5%) 0.091
              FilteredDismaxTerm      161.96      (3.1%)      163.96      (1.3%)    1.2% (  -3% -    5%) 0.101
                    FilteredTerm      161.87      (3.5%)      163.99      (1.4%)    1.3% (  -3% -    6%) 0.117
             CombinedAndHighHigh       23.01      (1.6%)       23.33      (1.7%)    1.4% (  -1% -    4%) 0.007
                DismaxOrHighHigh      127.57      (5.7%)      129.55      (5.1%)    1.6% (  -8% -   13%) 0.364
              Or2Terms2StopWords      200.65      (5.9%)      203.99      (5.9%)    1.7% (  -9% -   14%) 0.371
             And2Terms2StopWords      199.17      (5.7%)      202.67      (5.5%)    1.8% (  -8% -   13%) 0.323
              CombinedOrHighHigh       22.75      (3.3%)       23.16      (1.8%)    1.8% (  -3% -    7%) 0.030
                     OrStopWords       44.74     (11.7%)       45.62     (11.1%)    2.0% ( -18% -   28%) 0.589
                            Term      623.54      (9.7%)      635.98      (8.5%)    2.0% ( -14% -   22%) 0.489
                      OrHighRare      290.94      (8.7%)      297.45      (5.2%)    2.2% ( -10% -   17%) 0.325
                      OrHighHigh       71.01     (12.3%)       72.76     (11.9%)    2.5% ( -19% -   30%) 0.519
                      AndHighMed      190.36      (9.8%)      196.24      (9.8%)    3.1% ( -15% -   25%) 0.320
              FilteredAndHighMed      150.51      (3.4%)      156.00      (3.1%)    3.6% (  -2% -   10%) 0.000
                AndMedOrHighHigh       82.89      (5.2%)       86.16      (4.9%)    4.0% (  -5% -   14%) 0.013
                     AndHighHigh       61.20     (12.9%)       63.68     (13.1%)    4.0% ( -19% -   34%) 0.326
                    AndStopWords       41.77     (11.0%)       44.06     (11.0%)    5.5% ( -14% -   30%) 0.114
                        Or3Terms      216.89      (7.5%)      229.58      (7.0%)    5.8% (  -8% -   21%) 0.011
                       And3Terms      224.44      (7.3%)      237.93      (7.0%)    6.0% (  -7% -   21%) 0.008
                          OrMany       22.31      (4.9%)       23.66      (4.5%)    6.1% (  -3% -   16%) 0.000

@jpountz
Copy link
Contributor

jpountz commented Jul 11, 2025

To move this forward, I believe that we should try to add detection of UseSVE and AVX2 to Constants.java, then a HAS_FAST_COMPRESS constant (akin to HAS_FAST_VECTOR_FMA), and finally replace the check on the number of lanes in PanamaVectorUtil with a check on this new constant?

@uschindler
Copy link
Contributor

To move this forward, I believe that we should try to add detection of UseSVE and AVX2 to Constants.java, then a HAS_FAST_COMPRESS constant (akin to HAS_FAST_VECTOR_FMA), and finally replace the check on the number of lanes in PanamaVectorUtil with a check on this new constant?

UseSVE should be easy to detect in the same way like that:

/** true for an AMD cpu with SSE4a instructions. */
private static final boolean HAS_SSE4A =
HotspotVMOptions.get("UseXmmI2F").map(Boolean::valueOf).orElse(false);

For AVX, I think you have to parse the option as integer and compare >=2 or like that.

@rmuir
Copy link
Member

rmuir commented Jul 11, 2025

To move this forward, I believe that we should try to add detection of UseSVE and AVX2 to Constants.java, then a HAS_FAST_COMPRESS constant (akin to HAS_FAST_VECTOR_FMA), and finally replace the check on the number of lanes in PanamaVectorUtil with a check on this new constant?

+1: I like that idea, of having checks for each "feature". It makes this haphazard situation amenable to static analysis to prevent problems: I will separately look into that.

Note, this PR also uses some other features such as VectorMask.cast() and so on, I don't know off the top of my head what cpu instructions they use. Maybe nothing scary/special, just dont know.

@HUSTERGS
Copy link
Contributor Author

Sorry for the late reply, I'm a little busy these days.
I tried to look into the jdk source code, as Robert said before, in the arm situation, compress is guarded by UseSVE, which is an integer range from 0 to 2.
We need to parse the flag and check if it is none-zero value, I think this will be enough for the arm
https://github.com/openjdk/jdk/blob/99c299f0985c8be63b9b60e589db520d83fd8033/src/hotspot/cpu/aarch64/globals_aarch64.hpp#L104-L106

As for the AVX, I'm little bit confused by the source code, it seems we only need to check wheather the prefered vector size is higher than or equal to 256-bits ?

https://github.com/openjdk/jdk/blob/99c299f0985c8be63b9b60e589db520d83fd8033/src/hotspot/cpu/x86/x86.ad#L9488-L9528

Hope I'm not getting anything wrong, I'll try to implement these logic if it's correct, and check if the VectorMask.cast causes any extra trouble

@HUSTERGS
Copy link
Contributor Author

it seems VectorMask.cast is also guarded by UseSVE on arm:
https://github.com/openjdk/jdk/blob/5edd546585d66f52c2e894ed212ee67945fe0785/src/hotspot/cpu/aarch64/aarch64_vector_ad.m4#L3949-L3957

As for amd64, we still only need to check the vector bit-size ? (and maybe also the UseAVX flag >= 2 ? as uwe said before )
https://github.com/openjdk/jdk/blob/bcd86d575fe0682a234228c18b0c2e817d3816da/src/hotspot/cpu/x86/x86.ad#L1424-L1427

and
https://github.com/openjdk/jdk/blob/bcd86d575fe0682a234228c18b0c2e817d3816da/src/hotspot/cpu/x86/x86.ad#L1735-L1738

@rmuir
Copy link
Member

rmuir commented Jul 14, 2025

If you find more slow vector methods (such as this VectorMask.cast that you found here), can you add them to this list? It may help the next PR.

# Some vector APIs are only fast on specific hardware, and fallback to very slow
# pure-java implementations. List them here, to prevent traps.
@defaultMessage Potentially slow on some CPUs, please check Constants.HAS_FAST_VECTOR_FMA: FMA may fallback to BigDecimal
jdk.incubator.vector.Float16#fma(**)
jdk.incubator.vector.FloatVector#fma(**)
jdk.incubator.vector.DoubleVector#fma(**)
jdk.incubator.vector.VectorOperators#FMA
@defaultMessage Potentially slow on some CPUs, please check the CPU has feature: Unsupported on NEON
jdk.incubator.vector.ByteVector#compress(**)
jdk.incubator.vector.IntVector#compress(**)
jdk.incubator.vector.ShortVector#compress(**)
jdk.incubator.vector.LongVector#compress(**)
jdk.incubator.vector.VectorOperators#COMPRESS_BITS
jdk.incubator.vector.ByteVector#expand(**)
jdk.incubator.vector.IntVector#expand(**)
jdk.incubator.vector.ShortVector#expand(**)
jdk.incubator.vector.LongVector#expand(**)
jdk.incubator.vector.VectorOperators#EXPAND_BITS

@uschindler
Copy link
Contributor

Can you fix the message in forbidden-apis for compress:

@defaultMessage Potentially slow on some CPUs, please check the CPU has feature: Unsupported on NEON

There you should add your new message and then combine both sections.

@HUSTERGS
Copy link
Contributor Author

Can you fix the message in forbidden-apis for compress:

I didn't really check the expand operation, but I assume it should have the same requirements as compress ? So I left them to the same group as it was

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants