Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whither shrink wrapping? #2107

Closed
0xdaryl opened this issue Dec 5, 2017 · 21 comments
Closed

Whither shrink wrapping? #2107

0xdaryl opened this issue Dec 5, 2017 · 21 comments

Comments

@0xdaryl
Copy link
Contributor

0xdaryl commented Dec 5, 2017

PR #2066 led to a discussion on the fate of the shrink wrapping optimization in the Eclipse OMR compiler. This is an optimization intended to defer register preservation required by a method's linkage as late as possible in the method (rather than always doing it in the prologue). The observation is that if preservation is occurring because of register usage on colder paths of the method then deferring the preservations until those paths (if possible) would be more beneficial to performance.

Shrink wrapping was written for Java and originally motivated by the Power architecture but has seen little (if any) performance benefit on any architecture to date (X, P, and Z have implementations). And in fact there are known functional issues on P (and possibly Z) with the code as-is.

We decided to contribute the code to OMR not as a working optimization but as an example for how one could do dataflow over a stream of TR::Instruction's. There are some analyses and modifications to the instruction class that need to occur to make this possible, and this optimization provided a working example of how to do that reasonably effectively. It is certainly possible that parts of the shrink wrapping optimization are specialized to Java semantics and this will need to be discovered and fixed for it to become a general optimization (for example, it works with J9 JIT private linkage and will have to be adapted for various system ABIs).

The code will certainly be cleaner without it.

The argument was raised in #2066 that this optimization may be more beneficial to non-Java runtimes where the inliner is not as aggressive as it is in Java. I don't think we'll know that until we try, and to do so means it should be dusted off and generalized.

Is it worth leaving this vestigal code in OMR where it is simply built but not tested, or should it be cleaned up? Its already part of git history so it won't really be going anywhere if it's removed. If it's removed, we can ensure it's removed under a single PR so that all changes needed to resurrect are clear.

Any opinions?

@fjeremic
Copy link
Contributor

fjeremic commented Dec 8, 2017

Is it worth leaving this vestigal code in OMR where it is simply built but not tested, or should it be cleaned up? Its already part of git history so it won't really be going anywhere if it's removed. If it's removed, we can ensure it's removed under a single PR so that all changes needed to resurrect are clear.

From an OpenJ9 committer's point of view this sounds good to me. In a dynamic runtime instruction level dataflow analysis on warm methods is relatively expensive for what we get in return (which is currently nothing). Either we need to make the dataflow faster or improve the implementation of shrink wrapping. On Z at least the store and restore of preserved registers is a single store/load multiple instruction which has n/4 + 1 cycle count where n is the number of registers being stored. The benefit in theory would only be realized if we can eliminate on average 2 registers from being preserved.

@mstoodle
Copy link
Contributor

Maybe we should ask JitBuilder clients if they think shrink wrapping the preserved register saves/restores is worthwhile in their scenarios. I have seen it being a problem, but it never reached the "top of the pile" in importance for me in the JITs I've personally worked on.

Any feedback on this particular issue, @jduimovich @charliegracie @Leonardo2718 @ymanton ?

@Leonardo2718
Copy link
Contributor

I have no evidence regarding what the performance benefits of shrink wrapping would be in Lua Vermelha. I am also not familiar enough with the characteristics of shrink wrapping to make an educated guess. However, I suspect that the typically large number of branches in the generated code would make the data flow analysis very expensive to run, even on relatively small Lua functions.

Having never looked at the shrink wrapping code myself, I have no opinion about whether it should be cleaned up or reimplemented from scratch. However, I think shrink wrapping in Lua Vermelha is worth future investigation. So, I would like to see it brought back to life eventually, if only for the purpose of doing performance measurements to evaluate its usefulness. It would also be interesting to see how this optimization performs on ARM.

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Sep 12, 2018

This discussion has petered out, but I think it is important to reach a conclusion on this as new development often entangles with this code and can affect the design of new features (in a negative way).

My personal opinion (that I'll now turn into a proposal) is that the shrink wrapping code in the optimizer, code generators, and metadata be scrubbed from the code base. The code is there in the repo history if someone ever does feel the need to resurrect in some way or use it as an implementation example.

@vijaysun-omr @andrewcraik @gita-omr @ymanton @fjeremic @mstoodle : any final opinions on pardoning this code in OMR? I'd like to reach a conclusion by Wed Sept 19.

@fjeremic
Copy link
Contributor

fjeremic commented Sep 12, 2018

I'm also of the opinion we should purge the code. It's only consumer is OpenJ9 which has had the feature disabled as far in the history as I can go, and even looking at the history in the closed source repository before OpenJ9 open sourced I can see it has been disabled for at least half a decade due to functional problems.

With that in mind the state and benefit of the optimization is very questionable as is its presence in the codebase.

[1] https://github.com/eclipse/omr/blob/4c5bc8f56e8771fc26314c6e1a45c3caf384b08e/compiler/control/OMROptions.cpp#L2443-L2452

@vijaysun-omr
Copy link
Contributor

I agree that we should delete shrink wrapping from the codebase if it's causing non trivial complications in other designs. Keeping it in the codebase always had it's pluses and minuses and it seems like the minuses are more significant now and so it is not worth the cost of maintaining the code.

I'll document my thoughts here so that it's recorded as part of the process by which this decision is being reached:

  1. Perceived advantage: The opt may be useful in some other language in the future even though Java no longer needs it. The code probably won't be easy to resurrect if there is significant refactoring of the codegen in the future.
    Reality: No clear evidence has presented itself that it would in fact help any other language and so this possibility alone may not be enough to keep this extra complexity around in the codebase.

  2. Perceived advantage: The opt was one of the first (and maybe till date, only ?) place in the codegen where we used dataflow analysis and could have served as an example in case there was a need for doing another dataflow analysis.
    Reality: No such need has manifested itself in the last several years and maybe this is an indication that the chances of this being important are low going forward too. Plus the code can always be found in the history if someone really needed to borrow ideas from it in the future (modulo any refactoring that changes the picture significantly).

@fjeremic
Copy link
Contributor

fjeremic commented Sep 12, 2018

  1. Perceived advantage: The opt was one of the first (and maybe till date, only ?) place in the codegen where we used dataflow analysis and could have served as an example in case there was a need for doing another dataflow analysis.

I think if we are careful to keep the deprecation within a minimal set of PRs (say two or three) it would probably be even a better example of how it is done than to have someone without much experience in the codebase trying to find all the bits and pieces of where optimization touches.

That is to say the inverse of the set of commits which will remove ShrinkWrapping from the codebase is a clear example of how one can add a codegen dataflow optimization.

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Sep 12, 2018

Agreed. Removing it from downstream projects (e.g., OpenJ9) might have to happen first.

@andrewcraik
Copy link
Contributor

I'm fine with the code being removed based on the discussion above. The only thing I would suggest is that if there are any APIs on the data flow engine etc needed for the use in the codegens we may not want to purge those since they aren't part of the current complexity. We would need to mark/document these APIs (if any) to ensure they aren't purged as unused later if we want to keep the avenue of adding dataflow to codegen possible for other languages in the future.

@mstoodle
Copy link
Contributor

+1 from me, with a minor modification to Andrew's point: we should review those APIs that have been added to support the code generators to make sure that API still makes sense in the grander scheme of things. If they aren't the perceived right way to do it, then we should get rid of them and do it properly the next time around without being saddled by the constrains of an earlier ill conceived API.

@gita-omr
Copy link
Contributor

I need to investigate a bit more. Prologues on Power are indeed very expensive. I would like to understand better why this optimization did not work.

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Sep 13, 2018

The last performance study on shrink wrapping on Java was conducted maybe 5 years ago and saw only marginal gains in performance (certainly on x86, but I believe that was the case on the other architectures as well) running enterprise workloads (lots of methods, flat profile). One of the strongly suspected reasons behind this was because the aggressiveness and effectiveness of the Java inlining optimization limited the number of shrink wrapping opportunities (in part by reducing the number of prologues executed). And, improving the inlining strategy realizes many more benefits and presents far more opportunities for optimization than does shrink wrapping alone.

What kind of investigation are you thinking of: an actual performance run with investigation or a thought experiment? I'm doubtful that if this feature was switched on that it would still work out of the box.

These days, when more performance is needed somewhere, inlining usually plays a big part in the answer. I suspect you'll get far more out of investigating better inlining strategies (which works against shrink wrapping) than resurrecting and tuning shrink wrapping.

I'd still like to keep a Sept. 19 deadline to decide on this because I don't want development progress in OMR to be held up by a long-dormant feature.

@gita-omr
Copy link
Contributor

We spent a lot of time on improving inlining strategies and it proved to be very non-trivial. Essentially, it involves inter-procedural analysis. Inlining one method causes another not to be inlined, sometimes profiler does not/can't predict correct target etc. Finding the right balance is extremely difficult.
On another hand, reducing the number of saved registers in the prologue is a per method optimization and therefore should be much more straightforward. I wanted to understand how it works and why it causes all these troubles for OMR and needs to be removed. Unfortunately, I've been busy last couple weeks and did not follow the discussion.

@ymanton
Copy link
Contributor

ymanton commented Sep 13, 2018

I recall part of the reason this was abandoned was because relative to the performance it gave it was a very difficult opt to debug. When it went wrong it caused bugs to appear long after the method has been returned from and the evidence on the stack obliterated.

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Sep 13, 2018

I had a talk with @gita-omr and I'm willing to hold off on any decision for 3 weeks to give her time to analyze whether it will provide any substantial benefit to Power. I'd appreciate any insight on the potential impact of shrink wrapping to performance issues on Power.

@andrewcraik
Copy link
Contributor

@mstoodle I'm fine with your modification to my approach. If we are going to purge APIs that were done the 'wrong way' I'd like those to be done in a separate commit from the general removal of shrink wrapping (should it go ahead) so that people trying to find the old API to know what not to do with the new API can find it without wading through shrink wrapping details.

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Oct 3, 2018

Have you had the opportunity to reach any conclusions on this @gita-omr?

@gita-omr
Copy link
Contributor

gita-omr commented Oct 5, 2018

Have you had the opportunity to reach any conclusions on this @gita-omr?

I looked at the code and it's quite non-trivial. I suspect it comes from the fact that it's trying to deal with the control flow not expressed in CFG (e.g. snippets). I can also imagine that it may introduce a lot of bugs that are hard to debug.

On the other hand, I don't know of any better solution and can't see yet how we would do it better in case such optimization is needed again. I think it would be a pity to discard all this experience. At the very least we should remove it in such a way that it's very easy to recover.

fjeremic added a commit to fjeremic/omr that referenced this issue Oct 5, 2018
This is the first PR which effectively removes all the ShrinkWrapping code from OMR. The code is removed in such a way that it is easy to see all the coupled parts across the optimizer, codegens, and control.

The optimization currently depends on some cleanup in the downstream OpenJ9 project which has dependencies on this optimizaiton. There will be a subsequent "Part II" PR which will remove some of the shell files so as to not cause build breaks.

Issue: eclipse-omr#2107

Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com>
fjeremic added a commit to fjeremic/omr that referenced this issue Oct 5, 2018
This PR performs additional cleanup now that all dependencies have been
sorted out.

Issue: eclipse-omr#2107

Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com>
@fjeremic
Copy link
Contributor

fjeremic commented Oct 5, 2018

@0xdaryl I've contained the entire removal into 4 pieces so as to avoid build breaks. The PRs linked to this Issue describe in which order they are to be merged. None of the PRs have dependencies and the can be merged one after another without breaking builds. I have marked ones which must wait with WIP status to prevent accidentally merging them.

If both OMR and both OpenJ9 PRs are squashed together and the changes reversed it can serve as a reference if someone in the future wants to see how something like ShrinkWrapping would be implemented on the JIT side.

fjeremic added a commit to fjeremic/omr that referenced this issue Oct 5, 2018
This is the first PR which effectively removes all the ShrinkWrapping code from OMR. The code is removed in such a way that it is easy to see all the coupled parts across the optimizer, codegens, and control.

The optimization currently depends on some cleanup in the downstream OpenJ9 project which has dependencies on this optimizaiton. There will be a subsequent "Part II" PR which will remove some of the shell files so as to not cause build breaks.

Issue: eclipse-omr#2107

Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com>
pdbain-ibm pushed a commit to pdbain-ibm/omr that referenced this issue Oct 12, 2018
This is the first PR which effectively removes all the ShrinkWrapping code from OMR. The code is removed in such a way that it is easy to see all the coupled parts across the optimizer, codegens, and control.

The optimization currently depends on some cleanup in the downstream OpenJ9 project which has dependencies on this optimizaiton. There will be a subsequent "Part II" PR which will remove some of the shell files so as to not cause build breaks.

Issue: eclipse-omr#2107

Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com>
fjeremic added a commit to fjeremic/omr that referenced this issue Oct 12, 2018
This PR performs additional cleanup now that all dependencies have been
sorted out.

Issue: eclipse-omr#2107

Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com>
fjeremic added a commit to fjeremic/omr that referenced this issue Oct 15, 2018
Remove final mention of Shrink Wrapping from OMR.

Issue: eclipse-omr#2107

Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com>
@fjeremic
Copy link
Contributor

@0xdaryl I beleive we have the last two PRs. I can't find any reference to Shrink Wrapping in the codebase following these changes. We can close this issue once the above two PRs are merged.

@0xdaryl
Copy link
Contributor Author

0xdaryl commented Oct 18, 2018

I believe the vast majority of shrink wrapping was removed across a small number of commits in two repos in the following order:

  1. Deprecate ShrinkWrapping - Part I eclipse-openj9/openj9#3166
  2. Deprecate ShrinkWrapping - Part I #3036
  3. Deprecate ShrinkWrapping - Part II #3037
  4. Deprecate ShrinkWrapping - Part II eclipse-openj9/openj9#3167
  5. Deprecate ShrinkWrapping - Part III #3068
  6. Deprecate ShrinkWrapping - Part III eclipse-openj9/openj9#3291

Those wishing to experiment with this feature in the future will need to apply the PRs in reverse order.

Thanks @fjeremic for contributing the work. Closing this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants