-
Notifications
You must be signed in to change notification settings - Fork 6k
8278114: New addnode ideal optimization: converting "x + x" into "x << 1" #6675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… "x << 1", and associated tests.
👋 Welcome back CptGit! A progress list of the required criteria for merging this PR into |
Webrevs
|
@merykitty If you get a chance could you please take a quick look at this PR? I was wondering if this conversion makes sense. I am concerned if this obvious optimization has been done somewhere else which I was not able to find. Thank you very much. |
Tbh I don't find this transformation necessary, |
I agree this looks to be of dubious value on the face of it. A microbenchmark to prove it's beneficial in some scenario feels like a requirement here. A targeted microbenchmark could explore if we already do or could better allow constant folding of expressions that does subsequent shifts, e.g. by turning |
Thank you both for commenting and suggestions! @merykitty @cl4es I moved the transformation into LShiftNode, i.e., to convert |
@State(Scope.Thread) | ||
@Warmup(iterations = 20, time = 1, timeUnit = TimeUnit.SECONDS) | ||
@Measurement(iterations = 20, time = 1, timeUnit = TimeUnit.SECONDS) | ||
@Fork(value = 3 , jvmArgsAppend = {"-XX:-TieredCompilation"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is -TieredCompilation
necessary to demonstrate an effect? Settings that are not strictly necessary - such as tuning - should generally be avoided (someone might want to set up separate runs with TieredCompilation
enabled and disabled on a higher level, for example)
For such a small win you might be looking at noise and should inspect the generated assembly to see that there's any real difference in the emitted code. Running your
i.e., the Baseline build generates this:
Which is a pretty direct transformation of the Java code into assembly. So it seems your patch does enable more optimal transformations! Only a few percent of the benchmark time seem to be spent in the relevant code, though. As written I can't establish a significant result between a baseline VM and a patched one as the variance from run to run is more than a few percent. A narrowed down benchmark that only tests a single value seem to better demonstrate the low-level effect:
Not using the hand-rolled Baseline:
Patch:
A ~1.2ns improvement - or 1.15x speed-up after removing as much infrastructural overhead as possible. I think this looks in line with the observable change in the generated assembly (while we get rid of more than one instruction, they are likely pretty nicely pipelined on my Intel CPU). I think this looks like a reasonable improvement for such a straightforward change to C2. |
Thank you @cl4es so much for taking time to review my pull request and deeply investigate the microbenchmark! I learnt a lot on writing better JMH microbenchmarks by reading your comment.
Great to know we can print assembly! Thanks.
Thanks for point out this new feature. I did use
If you get a chance, could you please share the modifications you made to the microbenchmark, if you made other changes than removing
|
My pleasure. I was intrigued since you picked up on my suggestion to enable folding of constant shift transforms so fast and wanted to see for myself that it ended up compiling down to something more compact/optimal.
No other code changes, just added in those two simplified benchmarks and commented out the existing ones. I did tweak the command line to print using |
Is there special configuration you were using? I was not able to reproduce the similar results. This is when I used 20 iterations.
Patch:
Then I tried 100 interations to reduce noise.
Patch:
I were not able to print assembly by |
Alternatively, you could use |
Thank you for providing this alternative. It seems This is the output what I got, given
Thanks in advance.
|
The reason you don't see any shift instruction is that As you can see for these kinds of microbenchmarks the cost of calling functions dominates the cost of the actual transformation, you can mitigate this by pulling the helper method out to be the benchmark (I don't see the reason to have a separate helper method, either), and putting the operation in a loop that has the results being sunk in a compiler blackhole (full blackhole or a non-inlined sink method won't work as effective in this case simply because it is also a function call) in each iteration. |
…ling from dominating time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
@CptGit This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 117 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@vnkozlov, @cl4es) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
You still need approval from @cl4es |
/reviewers 2 |
@CptGit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. The further microbenchmark improvements suggested by @merykitty makes a lot of sense given the context.
The comment about compiler blackhole is becoming a bit redundant with the upgrade to JMH 1.34 (#6955), which enables the compiler assisted blackholes by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to request to update copyright year to 2022 in changed and new files headers.
Good catch! Also I addressed the redundant comment about compiler blackhole in the microbenchmark. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good. And my testing passed.
Cool, thanks for checking. I think I can integrate now? |
Yes. And I will sponsor it. |
/integrate |
/sponsor |
Going to push as commit f326305.
Your commit was automatically rebased without conflicts. |
A new ideal optimization can be introduced for addnode: converting "x + x" into "x << 1".
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6675/head:pull/6675
$ git checkout pull/6675
Update a local copy of the PR:
$ git checkout pull/6675
$ git pull https://git.openjdk.java.net/jdk pull/6675/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 6675
View PR using the GUI difftool:
$ git pr show -t 6675
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6675.diff