-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Whither shrink wrapping? #2107
Comments
From an OpenJ9 committer's point of view this sounds good to me. In a dynamic runtime instruction level dataflow analysis on warm methods is relatively expensive for what we get in return (which is currently nothing). Either we need to make the dataflow faster or improve the implementation of shrink wrapping. On Z at least the store and restore of preserved registers is a single store/load multiple instruction which has |
Maybe we should ask JitBuilder clients if they think shrink wrapping the preserved register saves/restores is worthwhile in their scenarios. I have seen it being a problem, but it never reached the "top of the pile" in importance for me in the JITs I've personally worked on. Any feedback on this particular issue, @jduimovich @charliegracie @Leonardo2718 @ymanton ? |
I have no evidence regarding what the performance benefits of shrink wrapping would be in Lua Vermelha. I am also not familiar enough with the characteristics of shrink wrapping to make an educated guess. However, I suspect that the typically large number of branches in the generated code would make the data flow analysis very expensive to run, even on relatively small Lua functions. Having never looked at the shrink wrapping code myself, I have no opinion about whether it should be cleaned up or reimplemented from scratch. However, I think shrink wrapping in Lua Vermelha is worth future investigation. So, I would like to see it brought back to life eventually, if only for the purpose of doing performance measurements to evaluate its usefulness. It would also be interesting to see how this optimization performs on ARM. |
This discussion has petered out, but I think it is important to reach a conclusion on this as new development often entangles with this code and can affect the design of new features (in a negative way). My personal opinion (that I'll now turn into a proposal) is that the shrink wrapping code in the optimizer, code generators, and metadata be scrubbed from the code base. The code is there in the repo history if someone ever does feel the need to resurrect in some way or use it as an implementation example. @vijaysun-omr @andrewcraik @gita-omr @ymanton @fjeremic @mstoodle : any final opinions on pardoning this code in OMR? I'd like to reach a conclusion by Wed Sept 19. |
I'm also of the opinion we should purge the code. It's only consumer is OpenJ9 which has had the feature disabled as far in the history as I can go, and even looking at the history in the closed source repository before OpenJ9 open sourced I can see it has been disabled for at least half a decade due to functional problems. With that in mind the state and benefit of the optimization is very questionable as is its presence in the codebase. |
I agree that we should delete shrink wrapping from the codebase if it's causing non trivial complications in other designs. Keeping it in the codebase always had it's pluses and minuses and it seems like the minuses are more significant now and so it is not worth the cost of maintaining the code. I'll document my thoughts here so that it's recorded as part of the process by which this decision is being reached:
|
I think if we are careful to keep the deprecation within a minimal set of PRs (say two or three) it would probably be even a better example of how it is done than to have someone without much experience in the codebase trying to find all the bits and pieces of where optimization touches. That is to say the inverse of the set of commits which will remove ShrinkWrapping from the codebase is a clear example of how one can add a codegen dataflow optimization. |
Agreed. Removing it from downstream projects (e.g., OpenJ9) might have to happen first. |
I'm fine with the code being removed based on the discussion above. The only thing I would suggest is that if there are any APIs on the data flow engine etc needed for the use in the codegens we may not want to purge those since they aren't part of the current complexity. We would need to mark/document these APIs (if any) to ensure they aren't purged as unused later if we want to keep the avenue of adding dataflow to codegen possible for other languages in the future. |
+1 from me, with a minor modification to Andrew's point: we should review those APIs that have been added to support the code generators to make sure that API still makes sense in the grander scheme of things. If they aren't the perceived right way to do it, then we should get rid of them and do it properly the next time around without being saddled by the constrains of an earlier ill conceived API. |
I need to investigate a bit more. Prologues on Power are indeed very expensive. I would like to understand better why this optimization did not work. |
The last performance study on shrink wrapping on Java was conducted maybe 5 years ago and saw only marginal gains in performance (certainly on x86, but I believe that was the case on the other architectures as well) running enterprise workloads (lots of methods, flat profile). One of the strongly suspected reasons behind this was because the aggressiveness and effectiveness of the Java inlining optimization limited the number of shrink wrapping opportunities (in part by reducing the number of prologues executed). And, improving the inlining strategy realizes many more benefits and presents far more opportunities for optimization than does shrink wrapping alone. What kind of investigation are you thinking of: an actual performance run with investigation or a thought experiment? I'm doubtful that if this feature was switched on that it would still work out of the box. These days, when more performance is needed somewhere, inlining usually plays a big part in the answer. I suspect you'll get far more out of investigating better inlining strategies (which works against shrink wrapping) than resurrecting and tuning shrink wrapping. I'd still like to keep a Sept. 19 deadline to decide on this because I don't want development progress in OMR to be held up by a long-dormant feature. |
We spent a lot of time on improving inlining strategies and it proved to be very non-trivial. Essentially, it involves inter-procedural analysis. Inlining one method causes another not to be inlined, sometimes profiler does not/can't predict correct target etc. Finding the right balance is extremely difficult. |
I recall part of the reason this was abandoned was because relative to the performance it gave it was a very difficult opt to debug. When it went wrong it caused bugs to appear long after the method has been returned from and the evidence on the stack obliterated. |
I had a talk with @gita-omr and I'm willing to hold off on any decision for 3 weeks to give her time to analyze whether it will provide any substantial benefit to Power. I'd appreciate any insight on the potential impact of shrink wrapping to performance issues on Power. |
@mstoodle I'm fine with your modification to my approach. If we are going to purge APIs that were done the 'wrong way' I'd like those to be done in a separate commit from the general removal of shrink wrapping (should it go ahead) so that people trying to find the old API to know what not to do with the new API can find it without wading through shrink wrapping details. |
Have you had the opportunity to reach any conclusions on this @gita-omr? |
I looked at the code and it's quite non-trivial. I suspect it comes from the fact that it's trying to deal with the control flow not expressed in CFG (e.g. snippets). I can also imagine that it may introduce a lot of bugs that are hard to debug. On the other hand, I don't know of any better solution and can't see yet how we would do it better in case such optimization is needed again. I think it would be a pity to discard all this experience. At the very least we should remove it in such a way that it's very easy to recover. |
This is the first PR which effectively removes all the ShrinkWrapping code from OMR. The code is removed in such a way that it is easy to see all the coupled parts across the optimizer, codegens, and control. The optimization currently depends on some cleanup in the downstream OpenJ9 project which has dependencies on this optimizaiton. There will be a subsequent "Part II" PR which will remove some of the shell files so as to not cause build breaks. Issue: eclipse-omr#2107 Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com>
This PR performs additional cleanup now that all dependencies have been sorted out. Issue: eclipse-omr#2107 Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com>
@0xdaryl I've contained the entire removal into 4 pieces so as to avoid build breaks. The PRs linked to this Issue describe in which order they are to be merged. None of the PRs have dependencies and the can be merged one after another without breaking builds. I have marked ones which must wait with WIP status to prevent accidentally merging them. If both OMR and both OpenJ9 PRs are squashed together and the changes reversed it can serve as a reference if someone in the future wants to see how something like ShrinkWrapping would be implemented on the JIT side. |
This is the first PR which effectively removes all the ShrinkWrapping code from OMR. The code is removed in such a way that it is easy to see all the coupled parts across the optimizer, codegens, and control. The optimization currently depends on some cleanup in the downstream OpenJ9 project which has dependencies on this optimizaiton. There will be a subsequent "Part II" PR which will remove some of the shell files so as to not cause build breaks. Issue: eclipse-omr#2107 Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com>
This is the first PR which effectively removes all the ShrinkWrapping code from OMR. The code is removed in such a way that it is easy to see all the coupled parts across the optimizer, codegens, and control. The optimization currently depends on some cleanup in the downstream OpenJ9 project which has dependencies on this optimizaiton. There will be a subsequent "Part II" PR which will remove some of the shell files so as to not cause build breaks. Issue: eclipse-omr#2107 Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com>
This PR performs additional cleanup now that all dependencies have been sorted out. Issue: eclipse-omr#2107 Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com>
Remove final mention of Shrink Wrapping from OMR. Issue: eclipse-omr#2107 Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com>
@0xdaryl I beleive we have the last two PRs. I can't find any reference to Shrink Wrapping in the codebase following these changes. We can close this issue once the above two PRs are merged. |
I believe the vast majority of shrink wrapping was removed across a small number of commits in two repos in the following order:
Those wishing to experiment with this feature in the future will need to apply the PRs in reverse order. Thanks @fjeremic for contributing the work. Closing this now. |
PR #2066 led to a discussion on the fate of the shrink wrapping optimization in the Eclipse OMR compiler. This is an optimization intended to defer register preservation required by a method's linkage as late as possible in the method (rather than always doing it in the prologue). The observation is that if preservation is occurring because of register usage on colder paths of the method then deferring the preservations until those paths (if possible) would be more beneficial to performance.
Shrink wrapping was written for Java and originally motivated by the Power architecture but has seen little (if any) performance benefit on any architecture to date (X, P, and Z have implementations). And in fact there are known functional issues on P (and possibly Z) with the code as-is.
We decided to contribute the code to OMR not as a working optimization but as an example for how one could do dataflow over a stream of
TR::Instruction
's. There are some analyses and modifications to the instruction class that need to occur to make this possible, and this optimization provided a working example of how to do that reasonably effectively. It is certainly possible that parts of the shrink wrapping optimization are specialized to Java semantics and this will need to be discovered and fixed for it to become a general optimization (for example, it works with J9 JIT private linkage and will have to be adapted for various system ABIs).The code will certainly be cleaner without it.
The argument was raised in #2066 that this optimization may be more beneficial to non-Java runtimes where the inliner is not as aggressive as it is in Java. I don't think we'll know that until we try, and to do so means it should be dusted off and generalized.
Is it worth leaving this vestigal code in OMR where it is simply built but not tested, or should it be cleaned up? Its already part of git history so it won't really be going anywhere if it's removed. If it's removed, we can ensure it's removed under a single PR so that all changes needed to resurrect are clear.
Any opinions?
The text was updated successfully, but these errors were encountered: