-
Notifications
You must be signed in to change notification settings - Fork 6k
8281518: New optimization: convert "(x|y)-(x^y)" into "x&y" #7395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back CptGit! A progress list of the required criteria for merging this PR into |
A bug has been filed https://bugs.openjdk.java.net/browse/JDK-8281518. |
Webrevs
|
There is a large number of transformations in this bitwise operation family such as |
I am not clear whether there is a justification for pushing this change. We are in danger of heading down the garden path looking for optimization fairies. The above transformation adds extra case handling overhead to the AD matcher (correction, ideal code) when processing a Subtract node which slows down compilation to a small degree for a relatively common case (most apps use subtraction). On the credit side it may generate a small speed up in generated code when the pattern is matched, the saving also depending on not just on seeing this pattern but also on how often the resulting generated code gets executed. So, we have a trade-off. For any app there are probably going to be a lot of times where the compiler matches subtract nodes. There are probably going to be very few cases where this pattern will turn up -- even if you include cases where it happens through recursive reduction -- and even less where the resulting generated code gets executed many times. At some point we need to trade off the compiler overhead for all applications against the potential gains for some applications. The micro-benchmark only addresses one side of that trade-off. I'd really like to see a better justification for including this patch and the related transformations suggested by @merykitty before proceeding. n.b. the fact that gcc and clang do this is not really a good argument. In Java the trade-off is one runtime cost against another which is not the case for those compilers. |
Hi, For clarification, my idea is to look at GCC and clang's codebases to see if there is a more general way to achieve every transformation elegantly instead of naively matching every combination, which may mitigate the cost for each additional transformation. Thanks. |
I agree. If we're dong this kind of optimization it makes little sense to do it piecemeal. Maybe, just maybe, there's some opportunity for some more general boolean simplification, but even then it's not clear how much of it is worth doing. |
Thanks for your input. I totally agree JIT cares compilation overhead way more than those static compilers, but I was wondering if there is a good way to benchmark the general cases where this pattern is few seen. I know there are some benckmark suites for Java such as specjvm or renaissance but I don't think they are a good fit here. What I wanted to ask is what is an objective metric in the community to decide if we should adopt a new optimization, if there is one. |
It's very difficult to find a way to assess the positive and negative aspects of a change like this. Micro-benchmarks only really provide a ballpark guide to the potential benefit because they test the effect of the change in isolation. Even then they only tell part of the story because they ignore the degree to which that benefit will be realized. The potential costs are even harder to estimate. They will vary from app to app according to what gets compiled and which paths the compilation takes. They will even vary from run to run of the same app because the JVM does not guarantee precise repeatability across restarts even if you keep all inputs the same. For quite a few ideal transformations it is clear that they will be applicable very frequently and hence that they are worth implementing. That's often clear because we know that frequently used Java language constructs translate to graphs that will have a shape that matches the input checked for by the ideal code. In other cases, we can know that related ideal transforms will recursively combine to generate the target shape. For many other possible transforms we are in a grey area where we cannot know if the cost of checking for will repay in saved execution. |
Mailing list message from John Rose on hotspot-compiler-dev: On 9 Feb 2022, at 8:38, Quan Anh Mai wrote:
Yes, that thought occurred to me as well. It seems like we are on the What *would* get us benefit in a cost-effective way would be to take ? John |
@CptGit This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
@CptGit This pull request has been inactive for more than 16 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the |
Convert
(x|y)-(x^y)
intox&y
, inSubINode::Ideal
andSubLNode::Ideal
.The results of the microbenchmark are as follows:
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7395/head:pull/7395
$ git checkout pull/7395
Update a local copy of the PR:
$ git checkout pull/7395
$ git pull https://git.openjdk.java.net/jdk pull/7395/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 7395
View PR using the GUI difftool:
$ git pr show -t 7395
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7395.diff