8278114: New addnode ideal optimization: converting "x + x" into "x << 1" #6675

CptGit · 2021-12-02T15:20:08Z

A new ideal optimization can be introduced for addnode: converting "x + x" into "x << 1".

// Convert "x + x" into "x << 1"
if (in1 == in2) {
  return new LShiftINode(in1, phase->intcon(1));
}

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed

Issue

JDK-8278114: New addnode ideal optimization: converting "x + x" into "x << 1"

Reviewers

Vladimir Kozlov (@vnkozlov - Reviewer)
Claes Redestad (@cl4es - Reviewer) ⚠️ Review applies to 306c95a

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6675/head:pull/6675
$ git checkout pull/6675

Update a local copy of the PR:
$ git checkout pull/6675
$ git pull https://git.openjdk.java.net/jdk pull/6675/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6675

View PR using the GUI difftool:
$ git pr show -t 6675

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6675.diff

… "x << 1", and associated tests.

bridgekeeper · 2021-12-02T15:21:31Z

👋 Welcome back CptGit! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2021-12-02T15:24:30Z

@CptGit The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2021-12-02T15:29:46Z

Webrevs

CptGit · 2021-12-16T17:35:26Z

@merykitty If you get a chance could you please take a quick look at this PR? I was wondering if this conversion makes sense. I am concerned if this obvious optimization has been done somewhere else which I was not able to find. Thank you very much.

merykitty · 2021-12-16T18:36:31Z

Tbh I don't find this transformation necessary, x + x is a very cheap operation and is generally easier for the optimiser to work with than x << 1.
Cheers.

cl4es · 2021-12-16T19:10:35Z

Tbh I don't find this transformation necessary, x + x is a very cheap operation and is generally easier for the optimiser to work with than x << 1. Cheers.

I agree this looks to be of dubious value on the face of it. A microbenchmark to prove it's beneficial in some scenario feels like a requirement here.

A targeted microbenchmark could explore if we already do or could better allow constant folding of expressions that does subsequent shifts, e.g. by turning (x + x) << 3 into x << 4.

CptGit · 2021-12-17T18:25:33Z

Tbh I don't find this transformation necessary, x + x is a very cheap operation and is generally easier for the optimiser to work with than x << 1. Cheers.

I agree this looks to be of dubious value on the face of it. A microbenchmark to prove it's beneficial in some scenario feels like a requirement here.

A targeted microbenchmark could explore if we already do or could better allow constant folding of expressions that does subsequent shifts, e.g. by turning (x + x) << 3 into x << 4.

Thank you both for commenting and suggestions! @merykitty @cl4es

I moved the transformation into LShiftNode, i.e., to convert (x + x) << 3 into x << 4, and I observed a 5% performance improvement (percentage of average time reduced) via the microbenchmark I write. I am not sure how much improvement we usually expect for a transformation, is this one currently a beneficial transformation? Thank you!

cl4es · 2021-12-17T23:52:22Z

test/micro/org/openjdk/bench/vm/compiler/AddIdeal_XPlusX_LShiftC.java

+@State(Scope.Thread)
+@Warmup(iterations = 20, time = 1, timeUnit = TimeUnit.SECONDS)
+@Measurement(iterations = 20, time = 1, timeUnit = TimeUnit.SECONDS)
+@Fork(value = 3 , jvmArgsAppend = {"-XX:-TieredCompilation"})


Is -TieredCompilation necessary to demonstrate an effect? Settings that are not strictly necessary - such as tuning - should generally be avoided (someone might want to set up separate runs with TieredCompilation enabled and disabled on a higher level, for example)

cl4es · 2021-12-18T02:59:48Z

Thank you both for commenting and suggestions! @merykitty @cl4es

I moved the transformation into LShiftNode, i.e., to convert (x + x) << 3 into x << 4, and I observed a 5% performance improvement (percentage of average time reduced) via the microbenchmark I write. I am not sure how much improvement we usually expect for a transformation, is this one currently a beneficial transformation? Thank you!

For such a small win you might be looking at noise and should inspect the generated assembly to see that there's any real difference in the emitted code.

Running your test microbenchmark using -prof perfasm on my Linux workstation the relevant code generated for the helper method with your patch looks like this:

  4.40%     0x00007f2b0d6d720c:   mov    %esi,%eax
            0x00007f2b0d6d720e:   and    $0x1fffff,%eax               ;*iushr {reexecute=0 rethrow=0 return_oop=0}
                                                                      ; - org.openjdk.bench.vm.compiler.AddIdeal_XPlusX_LShiftC::helper@8 (line 88)

i.e., the ((x + x) << 10) >> 11 is fully transformed down into x & 0x1fffff, which seem like it could be a pretty optimal result for this code (zeroes out the top 11 bits).

Baseline build generates this:

  4.46%      0x00007f93e16d988c:   add    %esi,%esi
             0x00007f93e16d988e:   shl    $0xa,%esi
             0x00007f93e16d9891:   mov    %esi,%eax
             0x00007f93e16d9893:   shr    $0xb,%eax                    ;*iushr {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - org.openjdk.bench.vm.compiler.AddIdeal_XPlusX_LShiftC::helper@8 (line 88)

Which is a pretty direct transformation of the Java code into assembly. So it seems your patch does enable more optimal transformations!

Only a few percent of the benchmark time seem to be spent in the relevant code, though. As written I can't establish a significant result between a baseline VM and a patched one as the variance from run to run is more than a few percent. A narrowed down benchmark that only tests a single value seem to better demonstrate the low-level effect:

    @Benchmark
    public int testInt() {
        return helper(ints_a[4711]);
    }
    
    @Benchmark
    public long testLong() {
        return helper(longs_a[4711]);
    }

Not using the hand-rolled sink blackhole means we rely on JMH blackholes. Those traditionally had a bit more overhead than what you get with sink, but can since recently cooperate with the VM to remove overhead almost completely if you run with -Djmh.blackhole.autoDetect=true (soon enabled by default). I also removed -TieredCompilation from the @Fork and get slightly better and stable scores overall, though a bit longer warmup is needed.

Baseline:

Benchmark                         Mode  Cnt  Score   Error  Units
AddIdeal_XPlusX_LShiftC.testInt   avgt   15  8.925 ± 0.019  ns/op
AddIdeal_XPlusX_LShiftC.testLong  avgt   15  8.969 ± 0.107  ns/op

Patch:

Benchmark                         Mode  Cnt  Score   Error  Units
AddIdeal_XPlusX_LShiftC.testInt   avgt   15  7.650 ± 0.024  ns/op
AddIdeal_XPlusX_LShiftC.testLong  avgt   15  7.655 ± 0.087  ns/op

A ~1.2ns improvement - or 1.15x speed-up after removing as much infrastructural overhead as possible. I think this looks in line with the observable change in the generated assembly (while we get rid of more than one instruction, they are likely pretty nicely pipelined on my Intel CPU). I think this looks like a reasonable improvement for such a straightforward change to C2.

CptGit · 2021-12-18T03:24:47Z

Thank you @cl4es so much for taking time to review my pull request and deeply investigate the microbenchmark! I learnt a lot on writing better JMH microbenchmarks by reading your comment.

Thank you both for commenting and suggestions! @merykitty @cl4es
I moved the transformation into LShiftNode, i.e., to convert (x + x) << 3 into x << 4, and I observed a 5% performance improvement (percentage of average time reduced) via the microbenchmark I write. I am not sure how much improvement we usually expect for a transformation, is this one currently a beneficial transformation? Thank you!

For such a small win you might be looking at noise and should inspect the generated assembly to see that there's any real difference in the emitted code.

Great to know we can print assembly! Thanks.

Running your test microbenchmark using -prof perfasm on my Linux workstation the relevant code generated for the helper method with your patch looks like this:
  4.40%     0x00007f2b0d6d720c:   mov    %esi,%eax
            0x00007f2b0d6d720e:   and    $0x1fffff,%eax               ;*iushr {reexecute=0 rethrow=0 return_oop=0}
                                                                      ; - org.openjdk.bench.vm.compiler.AddIdeal_XPlusX_LShiftC::helper@8 (line 88)
i.e., the ((x + x) << 10) >> 11 is fully transformed down into x & 0x1fffff, which seem like it could be a pretty optimal result for this code (zeroes out the top 11 bits).

Baseline build generates this:
  4.46%      0x00007f93e16d988c:   add    %esi,%esi
             0x00007f93e16d988e:   shl    $0xa,%esi
             0x00007f93e16d9891:   mov    %esi,%eax
             0x00007f93e16d9893:   shr    $0xb,%eax                    ;*iushr {reexecute=0 rethrow=0 return_oop=0}
                                                                       ; - org.openjdk.bench.vm.compiler.AddIdeal_XPlusX_LShiftC::helper@8 (line 88)
Which is a pretty direct transformation of the Java code into assembly. So it seems your patch does enable more optimal transformations!

Only a few percent of the benchmark time seem to be spent in the relevant code, though. As written I can't establish a significant result between a baseline VM and a patched one as the variance from run to run is more than a few percent. A narrowed down benchmark that only tests a single value seem to better demonstrate the low-level effect:
    @Benchmark
    public int testInt() {
        return helper(ints_a[4711]);
    }
    
    @Benchmark
    public long testLong() {
        return helper(longs_a[4711]);
    }

Thanks for point out this new feature. I did use Blackhole but found it brought too much overhead from infra so I referred to JMH examples for a sink method. I'll start using Backhole with -Djmh.blackhole.autoDetect=true from now on.

Not using the hand-rolled sink blackhole means we rely on JMH blackholes. Those traditionally had a bit more overhead than what you get with sink, but can since recently cooperate with the VM to remove overhead almost completely if you run with -Djmh.blackhole.autoDetect=true (soon enabled by default). I also removed -TieredCompilation from the @Fork and get slightly better and stable scores overall, though a bit longer warmup is needed.

Baseline:
Benchmark                         Mode  Cnt  Score   Error  Units
AddIdeal_XPlusX_LShiftC.testInt   avgt   15  8.925 ± 0.019  ns/op
AddIdeal_XPlusX_LShiftC.testLong  avgt   15  8.969 ± 0.107  ns/op
Patch:
Benchmark                         Mode  Cnt  Score   Error  Units
AddIdeal_XPlusX_LShiftC.testInt   avgt   15  7.650 ± 0.024  ns/op
AddIdeal_XPlusX_LShiftC.testLong  avgt   15  7.655 ± 0.087  ns/op

If you get a chance, could you please share the modifications you made to the microbenchmark, if you made other changes than removing -TieredCompilation and testing only a single value, so I can test on my end, for more data points? Also, the examples in JMH repository seem out-of-date (e.g., the Blackhole autodetect feature) I think your example will be a good study material for me. Thank you.

A ~1.2ns improvement - or 1.15x speed-up after removing as much infrastructural overhead as possible. I think this looks in line with the observable change in the generated assembly (while we get rid of more than one instruction, they are likely pretty nicely pipelined on my Intel CPU). I think this looks like a reasonable improvement for such a straightforward change to C2.

cl4es · 2021-12-18T03:43:08Z

Thank you @cl4es so much for taking time to review my pull request and deeply investigate the microbenchmark! I learnt a lot on writing better JMH microbenchmarks by reading your comment.

My pleasure. I was intrigued since you picked up on my suggestion to enable folding of constant shift transforms so fast and wanted to see for myself that it ended up compiling down to something more compact/optimal.

If you get a chance, could you please share the modifications you made to the microbenchmark, if you made other changes than removing -TieredCompilation and testing only a single value, so I can test on my end, for more data points? Also, the examples in JMH repository seem out-of-date (e.g., the Blackhole autodetect feature) I think your example will be a good study material for me. Thank you.

No other code changes, just added in those two simplified benchmarks and commented out the existing ones. I did tweak the command line to print using ns (you can change the annotation to get the same result) and picked the smallest number of iterations that still gave stable results on my machine while trying things out: -i 15 -wi 15 -tu ns.

CptGit · 2021-12-18T07:19:00Z

No other code changes, just added in those two simplified benchmarks and commented out the existing ones. I did tweak the command line to print using ns (you can change the annotation to get the same result) and picked the smallest number of iterations that still gave stable results on my machine while trying things out: -i 15 -wi 15 -tu ns.

Is there special configuration you were using? I was not able to reproduce the similar results.

This is when I used 20 iterations.
Baseline:

Benchmark                         Mode  Cnt  Score   Error  Units
AddIdeal_XPlusX_LShiftC.testInt   avgt   60  3.846 ? 0.164  ns/op
AddIdeal_XPlusX_LShiftC.testLong  avgt   60  3.332 ? 0.008  ns/op

Patch:

Benchmark                         Mode  Cnt  Score   Error  Units
AddIdeal_XPlusX_LShiftC.testInt   avgt   60  3.743 ? 0.245  ns/op
AddIdeal_XPlusX_LShiftC.testLong  avgt   60  3.664 ? 0.224  ns/op

Then I tried 100 interations to reduce noise.
Baseline:

Benchmark                         Mode  Cnt  Score   Error  Units
AddIdeal_XPlusX_LShiftC.testInt   avgt  300  3.566 ? 0.068  ns/op
AddIdeal_XPlusX_LShiftC.testLong  avgt  300  3.310 ? 0.004  ns/op

Patch:

Benchmark                         Mode  Cnt  Score   Error  Units
AddIdeal_XPlusX_LShiftC.testInt   avgt  300  3.724 ? 0.103  ns/op
AddIdeal_XPlusX_LShiftC.testLong  avgt  300  3.605 ? 0.086  ns/op

I were not able to print assembly by -prof perfasm on my end because of some errors from perf which I do not have permission to configure, but I checked native coverage and it showed this transformation did happen when the microbenchmark was running.

merykitty · 2021-12-18T12:14:06Z

Alternatively, you could use -XX:CompileCommand=print,*AddIdeal_XPlusX_LShiftC.testInt which will print every detail about compilation including opto assembly. In this case, you could use -XX:-TieredCompilation to not receive details regarding C1 compiled method. Note that without hsdis the JVM will only output machine code which is very hard to analyse.

CptGit · 2021-12-18T18:16:23Z

Alternatively, you could use -XX:CompileCommand=print,*AddIdeal_XPlusX_LShiftC.testInt which will print every detail about compilation including opto assembly. In this case, you could use -XX:-TieredCompilation to not receive details regarding C1 compiled method. Note that without hsdis the JVM will only output machine code which is very hard to analyse.

Thank you for providing this alternative.

It seems make install-hsdis does not install hsdis into the correct place. In my case, I have to manually copy it from build/cov/jdk/lib to the right place which is build/cov/images/jdk/lib/server so it can work. I don't know if this is some bug in makefile.

This is the output what I got, given @Fork(value = 1, jvmArgsAppend = {"-XX:-TieredCompilation", "-XX:CompileCommand=print,*AddIdeal_XPlusX_LShiftC.helper"}). I have two questions:

The assmbly looks weird to me because I did not see any shift instruction?
(update: Never mind now. This is the patch version so shift instructions were optimized out. I checked baseline version and shl and shr were both there.)
Why did it print at and only at the first warmup iteration? I did not expect c2 compilation to happen so early. Does JMH framework force a compilation for every benchmark even before the first iteration?

Thanks in advance.

Running test 'micro:AddIdeal_XPlusX_LShiftC'
# JMH version: 1.33
# VM version: JDK 19-internal, OpenJDK 64-Bit Server VM, 19-internal+0-adhoc..my-jdk
# VM invoker: /home/zzq/jog/testing/my-jdk/build/cov/images/jdk/bin/java
# VM options: -Djava.library.path=/home/zzq/jog/testing/my-jdk/build/cov/images/test/micro/native -XX:-TieredCompilation -XX:CompileCommand=print,*AddIdeal_XPlusX_LShiftC.helper
# Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect)
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.openjdk.bench.vm.compiler.AddIdeal_XPlusX_LShiftC.testInt

# Run progress: 0.00% complete, ETA 00:00:40
# Fork: 1 of 1
CompileCommand: print *AddIdeal_XPlusX_LShiftC.helper bool print = true
# Warmup Iteration   1: 
============================= C2-compiled nmethod ==============================
----------------------- MetaData before Compile_id = 188 ------------------------
{method}
 - this oop:          0x00007f977c476548
 - method holder:     'org/openjdk/bench/vm/compiler/AddIdeal_XPlusX_LShiftC'
 - constants:         0x00007f977c475f18 constant pool [76] {0x00007f977c475f18} for 'org/openjdk/bench/vm/compiler/AddIdeal_XPlusX_LShiftC' cache=0x00007f977c4ab6e0
 - access:            0x8100000a  private static 
 - name:              'helper'
 - signature:         '(I)I'
 - max stack:         3
 - max locals:        1
 - size of params:    1
 - method size:       13
 - vtable index:      -2
 - i2i entry:         0x00007f97b92f9c00
 - adapters:          AHE@0x00007f97c41125b0: 0xa i2c: 0x00007f97b9406660 c2i: 0x00007f97b9406719 c2iUV: 0x00007f97b94066e3 c2iNCI: 0x00007f97b9406756
 - compiled entry     0x00007f97b9406719
 - code size:         10
 - code start:        0x00007f977c476520
 - code end (excl):   0x00007f977c47652a
 - method data:       0x00007f977c4b4418
 - checked ex length: 0
 - linenumber start:  0x00007f977c47652a
 - localvar length:   1
 - localvar start:    0x00007f977c476532

------------------------ OptoAssembly for Compile_id = 188 -----------------------
#
#  int ( int )
#
#r018 rsi   : parm 0: int
# -- Old rsp -- Framesize: 32 --
#r591 rsp+28: in_preserve
#r590 rsp+24: return address
#r589 rsp+20: in_preserve
#r588 rsp+16: saved fp register
#r587 rsp+12: pad2, stack alignment
#r586 rsp+ 8: pad2, stack alignment
#r585 rsp+ 4: Fixed slot 1
#r584 rsp+ 0: Fixed slot 0
#
000     N1: #	out( B1 ) <- in( B1 )  Freq: 1

000     B1: #	out( N1 ) <- BLOCK HEAD IS JUNK  Freq: 1
000     # stack bang (96 bytes)
	pushq   rbp	# Save rbp
	subq    rsp, #16	# Create frame

00c     movl    RAX, RSI	# spill
00e     andl    RAX, #2097151	# int
014     addq    rsp, 16	# Destroy frame
	popq    rbp
	cmpq     rsp, poll_offset[r15_thread] 
	ja       #safepoint_stub	# Safepoint: poll for GC

026     ret

--------------------------------------------------------------------------------
----------------------------------- Assembly -----------------------------------

Compiled method (c2)    1974  188             org.openjdk.bench.vm.compiler.AddIdeal_XPlusX_LShiftC::helper (10 bytes)
 total in heap  [0x00007f97b9418c10,0x00007f97b9418e60] = 592
 relocation     [0x00007f97b9418d80,0x00007f97b9418d90] = 16
 main code      [0x00007f97b9418da0,0x00007f97b9418de0] = 64
 stub code      [0x00007f97b9418de0,0x00007f97b9418df8] = 24
 oops           [0x00007f97b9418df8,0x00007f97b9418e00] = 8
 metadata       [0x00007f97b9418e00,0x00007f97b9418e08] = 8
 scopes data    [0x00007f97b9418e08,0x00007f97b9418e18] = 16
 scopes pcs     [0x00007f97b9418e18,0x00007f97b9418e58] = 64
 dependencies   [0x00007f97b9418e58,0x00007f97b9418e60] = 8

[Disassembly]
--------------------------------------------------------------------------------
[Constant Pool (empty)]

--------------------------------------------------------------------------------

[Verified Entry Point]
  # {method} {0x00007f977c476548} 'helper' '(I)I' in 'org/openjdk/bench/vm/compiler/AddIdeal_XPlusX_LShiftC'
  # parm0:    rsi       = int
  #           [sp+0x20]  (sp of caller)
 ;; N1: #	out( B1 ) <- in( B1 )  Freq: 1
 ;; B1: #	out( N1 ) <- BLOCK HEAD IS JUNK  Freq: 1
  0x00007f97b9418da0:   mov    %eax,-0x16000(%rsp)
  0x00007f97b9418da7:   push   %rbp
  0x00007f97b9418da8:   sub    $0x10,%rsp                   ;*synchronization entry
                                                            ; - org.openjdk.bench.vm.compiler.AddIdeal_XPlusX_LShiftC::helper@-1 (line 98)
  0x00007f97b9418dac:   mov    %esi,%eax
  0x00007f97b9418dae:   and    $0x1fffff,%eax               ;*iushr {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - org.openjdk.bench.vm.compiler.AddIdeal_XPlusX_LShiftC::helper@8 (line 98)
  0x00007f97b9418db4:   add    $0x10,%rsp
  0x00007f97b9418db8:   pop    %rbp
  0x00007f97b9418db9:   cmp    0x388(%r15),%rsp             ;   {poll_return}
  0x00007f97b9418dc0:   ja     0x00007f97b9418dc7
  0x00007f97b9418dc6:   ret    
  0x00007f97b9418dc7:   movabs $0x7f97b9418db9,%r10         ;   {internal_word}
  0x00007f97b9418dd1:   mov    %r10,0x3a0(%r15)
  0x00007f97b9418dd8:   jmp    0x00007f97b9408120           ;   {runtime_call SafepointBlob}
  0x00007f97b9418ddd:   hlt    
  0x00007f97b9418dde:   hlt    
  0x00007f97b9418ddf:   hlt    
[Exception Handler]
  0x00007f97b9418de0:   jmp    0x00007f97b936a6a0           ;   {no_reloc}
[Deopt Handler Code]
  0x00007f97b9418de5:   call   0x00007f97b9418dea
  0x00007f97b9418dea:   subq   $0x5,(%rsp)
  0x00007f97b9418def:   jmp    0x00007f97b9407240           ;   {runtime_call DeoptimizationBlob}
  0x00007f97b9418df4:   hlt    
  0x00007f97b9418df5:   hlt    
  0x00007f97b9418df6:   hlt    
  0x00007f97b9418df7:   hlt    
--------------------------------------------------------------------------------
[/Disassembly]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Oops:
  0x00007f97b9418df8:   0x00000007ff755d38 a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007ff755d38}
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Metadata:
  0x00007f97b9418e00:   0x00007f977c476548 {method} {0x00007f977c476548} 'helper' '(I)I' in 'org/openjdk/bench/vm/compiler/AddIdeal_XPlusX_LShiftC'
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
pc-bytecode offsets:
PcDesc(pc=0x00007f97b9418d9f offset=ffffffff bits=0):
PcDesc(pc=0x00007f97b9418dac offset=c bits=0):
   org.openjdk.bench.vm.compiler.AddIdeal_XPlusX_LShiftC::helper@-1 (line 98)
PcDesc(pc=0x00007f97b9418db4 offset=14 bits=0):
   org.openjdk.bench.vm.compiler.AddIdeal_XPlusX_LShiftC::helper@8 (line 98)
PcDesc(pc=0x00007f97b9418df9 offset=59 bits=0):
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
oop maps:ImmutableOopMapSet contains 0 OopMaps

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
scopes:
ScopeDesc(pc=0x00007f97b9418dac offset=c):
   org.openjdk.bench.vm.compiler.AddIdeal_XPlusX_LShiftC::helper@-1 (line 98)
ScopeDesc(pc=0x00007f97b9418db4 offset=14):
   org.openjdk.bench.vm.compiler.AddIdeal_XPlusX_LShiftC::helper@8 (line 98)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
relocations:
         @0x00007f97b9418d80: b019
relocInfo@0x00007f97b9418d80 [type=11(poll_return) addr=0x00007f97b9418db9 offset=25]
         @0x00007f97b9418d82: f00e800e
relocInfo@0x00007f97b9418d84 [type=8(internal_word) addr=0x00007f97b9418dc7 offset=14 data=14] | [target=0x00007f97b9418db9]
         @0x00007f97b9418d86: 6411
relocInfo@0x00007f97b9418d86 [type=6(runtime_call) addr=0x00007f97b9418dd8 offset=17 format=1] | [destination=0x00007f97b9408120]
         @0x00007f97b9418d88: 0008
relocInfo@0x00007f97b9418d88 [type=0(none) addr=0x00007f97b9418de0 offset=8]
         @0x00007f97b9418d8a: 6400
relocInfo@0x00007f97b9418d8a [type=6(runtime_call) addr=0x00007f97b9418de0 offset=0 format=1] | [destination=0x00007f97b936a6a0]
         @0x00007f97b9418d8c: 640f
relocInfo@0x00007f97b9418d8c [type=6(runtime_call) addr=0x00007f97b9418def offset=15 format=1] | [destination=0x00007f97b9407240]
         @0x00007f97b9418d8e: 0000
relocInfo@0x00007f97b9418d8e [type=0(none) addr=0x00007f97b9418def offset=0]
         @0x00007f97b9418d90: 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Dependencies:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
ExceptionHandlerTable (size = 0 bytes)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
ImplicitExceptionTable is empty
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Recorded oops:
#0: 0x0000000000000000 NULL-oop
#1: 0x00000007ff755d38 a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000007ff755d38}
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Recorded metadata:
#0: 0x0000000000000000 NULL-oop
#1: 0x00007f977c476548 {method} {0x00007f977c476548} 'helper' '(I)I' in 'org/openjdk/bench/vm/compiler/AddIdeal_XPlusX_LShiftC'
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
3.711 ns/op
# Warmup Iteration   2: 3.515 ns/op
# Warmup Iteration   3: 3.313 ns/op
# Warmup Iteration   4: 3.433 ns/op
# Warmup Iteration   5: 3.526 ns/op
# Warmup Iteration   6: 3.458 ns/op
# Warmup Iteration   7: 3.446 ns/op
# Warmup Iteration   8: 3.497 ns/op
# Warmup Iteration   9: 3.471 ns/op
# Warmup Iteration  10: 3.461 ns/op
# Warmup Iteration  11: 3.429 ns/op
# Warmup Iteration  12: 3.533 ns/op
# Warmup Iteration  13: 3.545 ns/op
# Warmup Iteration  14: 3.613 ns/op
# Warmup Iteration  15: 3.534 ns/op
# Warmup Iteration  16: 3.495 ns/op
# Warmup Iteration  17: 3.567 ns/op
# Warmup Iteration  18: 3.466 ns/op
# Warmup Iteration  19: 3.310 ns/op
# Warmup Iteration  20: 3.305 ns/op
Iteration   1: 3.285 ns/op
Iteration   2: 3.368 ns/op
Iteration   3: 3.315 ns/op
Iteration   4: 3.304 ns/op
Iteration   5: 3.359 ns/op
Iteration   6: 3.340 ns/op
Iteration   7: 3.330 ns/op
Iteration   8: 3.331 ns/op
Iteration   9: 3.447 ns/op
Iteration  10: 3.370 ns/op
Iteration  11: 3.389 ns/op
Iteration  12: 3.342 ns/op
Iteration  13: 3.332 ns/op
Iteration  14: 3.349 ns/op
Iteration  15: 3.344 ns/op
Iteration  16: 3.330 ns/op
Iteration  17: 3.327 ns/op
Iteration  18: 3.419 ns/op
Iteration  19: 3.341 ns/op
Iteration  20: 3.346 ns/op
------------------------------------------------------------------------
static org.openjdk.bench.vm.compiler.AddIdeal_XPlusX_LShiftC::helper(I)I
  interpreter_invocation_count:       93898
  invocation_counter:                 93898
  backedge_counter:                       0
  decompile_count:                        0
  mdo size: 392 bytes

0 iload_0
1 iload_0
2 iadd
3 bipush 10
5 ishl
6 bipush 11
8 iushr
9 ireturn
------------------------------------------------------------------------
Total MDO size: 392 bytes


Result "org.openjdk.bench.vm.compiler.AddIdeal_XPlusX_LShiftC.testInt":
  3.348 ±(99.9%) 0.032 ns/op [Average]
  (min, avg, max) = (3.285, 3.348, 3.447), stdev = 0.037
  CI (99.9%): [3.316, 3.381] (assumes normal distribution)


# Run complete. Total time: 00:00:42

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                        Mode  Cnt  Score   Error  Units
AddIdeal_XPlusX_LShiftC.testInt  avgt   20  3.348 ± 0.032  ns/op
Finished running test 'micro:AddIdeal_XPlusX_LShiftC'

merykitty · 2021-12-19T11:05:58Z

The reason you don't see any shift instruction is that x << 11 >>> 11 is transformed into x & 0x1FFFFF which is the instruction you see in 0x00007f97b9418dae: and $0x1fffff,%eax.

As you can see for these kinds of microbenchmarks the cost of calling functions dominates the cost of the actual transformation, you can mitigate this by pulling the helper method out to be the benchmark (I don't see the reason to have a separate helper method, either), and putting the operation in a loop that has the results being sunk in a compiler blackhole (full blackhole or a non-inlined sink method won't work as effective in this case simply because it is also a function call) in each iteration.

…ling from dominating time.

vnkozlov

Looks good to me.

openjdk · 2022-01-04T18:51:47Z

@CptGit This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8278114: New addnode ideal optimization: converting "x + x" into "x << 1"

Reviewed-by: kvn, redestad

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 117 new commits pushed to the master branch:

191f730: Merge
95a3010: 8275830: C2: Receiver downcast is missing when inlining through method handle linkers
58b5fb3: 8265317: [vector] assert(payload->is_object()) failed: expected 'object' value for scalar-replaced boxed vector but got: NULL
06f3713: 8279379: GHA: Print tests that are in error
e7244c1: 8278966: two microbenchmarks tests fail "assert(!jvms->method()->has_exception_handlers()) failed: no exception handler expected" after JDK-8275638
b4b0328: 8278824: Uneven work distribution when scanning heap roots in G1
99a8351: 8279386: Remove duplicate RefProcPhaseTimeTracker
d1e6f26: 8279351: [TESTBUG] SADebugDTest.java does not handle "Address already in use" error
93c7d90: 8278282: G1: Log basic statistics for evacuation failure
1ffdc52: 8279412: [JVMCI] failed speculations list must outlive any nmethod that refers to it
... and 107 more: https://git.openjdk.java.net/jdk/compare/937126b1406ff0f6ac0828310e5e09003692dcd3...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@vnkozlov, @cl4es) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

vnkozlov · 2022-01-04T18:53:01Z

You still need approval from @cl4es

CptGit · 2022-01-04T18:55:02Z

/reviewers 2

openjdk · 2022-01-04T18:55:18Z

@CptGit
The number of required reviews for this PR is now set to 2 (with at least 1 of role reviewers).

cl4es

Looks good. The further microbenchmark improvements suggested by @merykitty makes a lot of sense given the context.

The comment about compiler blackhole is becoming a bit redundant with the upgrade to JMH 1.34 (#6955), which enables the compiler assisted blackholes by default.

vnkozlov

I forgot to request to update copyright year to 2022 in changed and new files headers.

CptGit · 2022-01-04T20:48:24Z

I forgot to request to update copyright year to 2022 in changed files headers.

Good catch! Also I addressed the redundant comment about compiler blackhole in the microbenchmark.

vnkozlov

Good. And my testing passed.

CptGit · 2022-01-04T23:37:28Z

Good. And my testing passed.

Cool, thanks for checking. I think I can integrate now?

vnkozlov · 2022-01-05T00:30:52Z

Cool, thanks for checking. I think I can integrate now?

Yes. And I will sponsor it.

CptGit · 2022-01-05T00:31:43Z

/integrate

openjdk · 2022-01-05T00:32:50Z

@CptGit
Your change (at version 49f9572) is now ready to be sponsored by a Committer.

vnkozlov · 2022-01-05T00:38:03Z

/sponsor

openjdk · 2022-01-05T00:39:14Z

Going to push as commit f326305.
Since your change was applied there have been 117 commits pushed to the master branch:

191f730: Merge
95a3010: 8275830: C2: Receiver downcast is missing when inlining through method handle linkers
58b5fb3: 8265317: [vector] assert(payload->is_object()) failed: expected 'object' value for scalar-replaced boxed vector but got: NULL
06f3713: 8279379: GHA: Print tests that are in error
e7244c1: 8278966: two microbenchmarks tests fail "assert(!jvms->method()->has_exception_handlers()) failed: no exception handler expected" after JDK-8275638
b4b0328: 8278824: Uneven work distribution when scanning heap roots in G1
99a8351: 8279386: Remove duplicate RefProcPhaseTimeTracker
d1e6f26: 8279351: [TESTBUG] SADebugDTest.java does not handle "Address already in use" error
93c7d90: 8278282: G1: Log basic statistics for evacuation failure
1ffdc52: 8279412: [JVMCI] failed speculations list must outlive any nmethod that refers to it
... and 107 more: https://git.openjdk.java.net/jdk/compare/937126b1406ff0f6ac0828310e5e09003692dcd3...master

Your commit was automatically rebased without conflicts.

openjdk · 2022-01-05T00:39:32Z

@vnkozlov @CptGit Pushed as commit f326305.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

include a new optimization for ideal in addnode: Convert "x + x" into…

4034492

… "x << 1", and associated tests.

openjdk bot added the rfr Pull request is ready for review label Dec 2, 2021

openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Dec 2, 2021

CptGit added 2 commits December 3, 2021 12:46

enrich tests and do the same transformation for "long".

5a0fad0

Merge master.

33e7c93

CptGit added 6 commits December 16, 2021 13:33

Merge master.

5992459

replace (x + x) -> x >> 1 with (x + x) >> c -> x >> (c + 1).

43fae6a

remove test for the removed optimization.

103d4d8

include ir tests.

2f8fad5

update mircobenchmark to reflect the new optimization.

257a38a

clean.

1ecff96

rename the ir test.

290e9f2

cl4es reviewed Dec 17, 2021

View reviewed changes

CptGit added 2 commits December 18, 2021 15:02

slightly update microbenchmark

16580b2

clean microbenchmark.

2bb824d

use compiler mode blackhole in microbenchmark to prevent function cal…

e42393e

…ling from dominating time.

vnkozlov approved these changes Jan 4, 2022

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Jan 4, 2022

openjdk bot removed the ready Pull request is ready to be integrated label Jan 4, 2022

cl4es approved these changes Jan 4, 2022

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Jan 4, 2022

vnkozlov suggested changes Jan 4, 2022

View reviewed changes

openjdk bot removed the ready Pull request is ready to be integrated label Jan 4, 2022

update comments.

941ced7

vnkozlov approved these changes Jan 4, 2022

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Jan 4, 2022

openjdk bot added the sponsor Pull request is ready to be sponsored label Jan 5, 2022

openjdk bot added the integrated Pull request has been integrated label Jan 5, 2022

openjdk bot closed this Jan 5, 2022

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Jan 5, 2022

CptGit mentioned this pull request Jan 5, 2022

8279485: Incorrect copyright year in compiler/lib/ir_framework/IRNode.java after JDK-8278114 #6962

Closed

3 tasks

8278114: New addnode ideal optimization: converting "x + x" into "x << 1" #6675

8278114: New addnode ideal optimization: converting "x + x" into "x << 1" #6675

Uh oh!

Conversation

CptGit commented Dec 2, 2021 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented Dec 2, 2021

Uh oh!

openjdk bot commented Dec 2, 2021

Uh oh!

mlbridge bot commented Dec 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

CptGit commented Dec 16, 2021

Uh oh!

merykitty commented Dec 16, 2021

Uh oh!

cl4es commented Dec 16, 2021

Uh oh!

CptGit commented Dec 17, 2021

Uh oh!

cl4es Dec 17, 2021

Choose a reason for hiding this comment

Uh oh!

cl4es commented Dec 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CptGit commented Dec 18, 2021

Uh oh!

cl4es commented Dec 18, 2021

Uh oh!

CptGit commented Dec 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

merykitty commented Dec 18, 2021

Uh oh!

CptGit commented Dec 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

merykitty commented Dec 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

openjdk bot commented Jan 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vnkozlov commented Jan 4, 2022

Uh oh!

CptGit commented Jan 4, 2022

Uh oh!

openjdk bot commented Jan 4, 2022

Uh oh!

cl4es left a comment

Choose a reason for hiding this comment

Uh oh!

vnkozlov left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CptGit commented Jan 4, 2022

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

CptGit commented Jan 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vnkozlov commented Jan 5, 2022

Uh oh!

CptGit commented Jan 5, 2022

Uh oh!

openjdk bot commented Jan 5, 2022

CptGit commented Dec 2, 2021 •

edited by openjdk bot

Loading

mlbridge bot commented Dec 2, 2021 •

edited

Loading

cl4es commented Dec 18, 2021 •

edited

Loading

CptGit commented Dec 18, 2021 •

edited

Loading

CptGit commented Dec 18, 2021 •

edited

Loading

merykitty commented Dec 19, 2021 •

edited

Loading

openjdk bot commented Jan 4, 2022 •

edited

Loading

vnkozlov left a comment •

edited

Loading

CptGit commented Jan 4, 2022 •

edited

Loading