Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Escape Analysis under OSR #5737

Merged
merged 2 commits into from
Sep 4, 2019

Conversation

hzongaro
Copy link
Member

This change defines new optimization passes, preEscapeAnalysis and postEscapeAnalysis, that help Escape Analysis (EA) do better under voluntary OSR.

The new preEscapeAnalysis optimization looks for calls to OSRInductionHelper and adds a fake prepareForOSR call that references all live auto symrefs and pending pushes. Any object that ends up as a candidate for stack allocation that appears to be used by such a fake prepareForOSR call can be heapified at that point.

During EA itself, those references on prepareForOSR calls are marked as ignoreable for the purposes of determining whether an object can be stack allocated.

After EA, the postEscapeAnalysis pass will remove the fake calls to prepareForOSR.

Submitted on behalf of Andrew Craik ajcraik@ca.ibm.com

Signed-off-by: Henry Zongaro zongaro@ca.ibm.com

@hzongaro
Copy link
Member Author

hzongaro commented May 11, 2019

This change is dependent upon OMR pull request 3841

Vijay @vijaysun-omr, may I ask you to review this change? Andrew @andrewcraik, may I ask you to verify my descriptions of your improvements?

2019-05-13: Removed WIP from title, as OMR pull request 3841 has been merged.

@hzongaro
Copy link
Member Author

Force pushed to c31e473 in order to rectify copyright problem

@hzongaro hzongaro changed the title WIP: Improve Escape Analysis under OSR Improve Escape Analysis under OSR May 13, 2019
@hzongaro
Copy link
Member Author

CI builds were failing because I had failed to include PreEscapeAnalysis.cpp and PostEscapeAnalysis.cpp in the list of files to be compiled with cmake. Fixed in 1e74777

@andrewcraik
Copy link
Contributor

description is correct - review from @vijaysun-omr is required since I was involved with the development of this change.

Copy link
Contributor

@andrewcraik andrewcraik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note - we need @vijaysun-omr to review since I was involved with the implementation. I verify the docs are correct and the implementation looks reasonable.

@vijaysun-omr
Copy link
Contributor

Looks reasonable to me. What kind of testing has been done on this change ?

@andrewcraik
Copy link
Contributor

@hzongaro can you comment on the testing?

@hzongaro
Copy link
Member Author

@vijaysun-omr Vijay asked,

What kind of testing has been done on this change?

Thanks for reviewing, Vijay. I ran internal "gold" testing (level.sanity, level.regression, level.promotion) on all platforms.

I also tested this change with OMR pull request #3842 against the benchmark referenced in issue #2072. Unfortunately, even after this change, there still seems to be lingering heap allocation of Double objects in the loop. I'm tracking down the reason for that, but its resolution might have to be the subject of a subsequent pull request.

@hzongaro
Copy link
Member Author

Squashed commits down to 09d6692 and 5c9af1b

@hzongaro
Copy link
Member Author

I wrote,

Unfortunately, even after this change, there still seems to be lingering heap allocation of Double objects in the loop. I'm tracking down the reason for that, but its resolution might have to be the subject of a subsequent pull request.

I think the lingering object allocation that I mentioned was due to a "fix" I made for an infinite recursion. I will mark this pull request as WIP while I fix the problem, so the intended improvement to the Escape Analysis optimization isn't hobbled.

@hzongaro hzongaro changed the title Improve Escape Analysis under OSR WIP: Improve Escape Analysis under OSR May 22, 2019
@charliegracie
Copy link
Contributor

Is this PR functionally correct? Does it improve the EA and the amount of objets that get stack allocated? If the answer is yes to those questions then I do not see why we do not merge this PR and the result of the investigation into further refinements can not happen in another PR.

@vijaysun-omr
Copy link
Contributor

@hzongaro could you please answer ?

@hzongaro
Copy link
Member Author

@charliegracie Charlie, I believe the PR is functionally correct, improves EA and reduces the amount of objects that get stack allocated. My concern is that Andrew's primary motivation for these changes was to improve on the number of objects stack allocated for issue #2072.

In testing the change with the benchmark snippet that appeared in the description from issue #2072 -- which I've reproduced below -- I verified that the objects in the loop were stack allocated with Andrew's changes.

public void run() {
      Double result = 1d;
      Double bound = Main.until;
      for (Double f = 0d; f < bound; f++) {
        result += f / (result + 1);
      }
      if (log) System.out.println(result);
    }

However, last week I went back to the original benchmark link (which now seems to be broken) in the issue. There, the relevant code in the benchmark was slightly different:

    public void run() {
      final Long startTime = System.currentTimeMillis();
            
      Double result = 1d;
      Double bound = Main.until;

      for (Double f = 0d; f < bound; f++) {
        result += f / (result + 1);
      }
      
      final Long endTime = System.currentTimeMillis();
      if (log) System.out.println(result);
      if (log) System.out.println(Thread.currentThread().getName() + " finished in " + (endTime - startTime) + "ms");    
    }

With the addition of those boxed Long values, the object allocations within the loop remain there. One reason is a fix that I made in commit eb5edef, which turns out to be too conservative, and ends up interfering with performing the stack allocations in this seemingly very similar method.

So, although the stack allocations now happen with the version of the code that appeared in issue #2072, the fact that there was no improvement in stack allocation using the original version of the benchmark led me to move this pull request back to WIP.

If you feel it's still worthwhile merging this change while the work on ensuring the objects in the original version of the benchmark can be stack allocated continues, I would be happy to remove the WIP from the title.

Please accept my apologies for failing to notice this difference earlier.

@andrewcraik
Copy link
Contributor

@hzongaro I think it is worth putting this in. Getting the Double to stack allocate requires loosening a conservatively correct check to handle some additional situations and is not likely to refactor all of the other changes made here. There may be an additional enhancement to get this other version working, but what we have so far is an improvement and won't be a regression so we should get it and work on the further enhancement.

@hzongaro
Copy link
Member Author

OK. Removing WIP from the title.

@hzongaro hzongaro changed the title WIP: Improve Escape Analysis under OSR Improve Escape Analysis under OSR May 27, 2019
@andrewcraik
Copy link
Contributor

Jenkins test sanity xlinux,plinux,win jdk8,jdk11

@hzongaro
Copy link
Member Author

Hi, Vijay @vijaysun-omr - I'm still working on tracking down the source of the intermittent crash from TR_PreEscapeAnalysis::perform. It's difficult to estimate how quickly I can find the source of that problem, but it may well be a preexisting problem that this pull request happens to trip over.

@hzongaro hzongaro changed the title Improve Escape Analysis under OSR WIP: Improve Escape Analysis under OSR Jul 30, 2019
@hzongaro
Copy link
Member Author

hzongaro commented Jul 30, 2019

Sorry for the long delay on updating this. I've marked this change as a work in progress. As was mentioned in #5737 (comment), there is intermittent failure that occurs with this change. The problem is in unrelated code, but happens to exposed by this change.

OpenJ9 pull request #6626, OMR pull request #4177 and OMR pull request #4182 address that problem. Once they are delivered, I will remove the WIP prefix from this pull request.

@hzongaro
Copy link
Member Author

hzongaro commented Aug 1, 2019

Just force pushed rebasing the branch on the latest master. No changes otherwise.

@andrewcraik
Copy link
Contributor

Jenkins test sanity xlinux,win,plinux jdk8,jdk11

@andrewcraik
Copy link
Contributor

have started sanity since the sequence of changes needed to fix the OSR metadata bug exposed by this work has now landed.

@hzongaro hzongaro changed the title WIP: Improve Escape Analysis under OSR Improve Escape Analysis under OSR Aug 1, 2019
@hzongaro
Copy link
Member Author

hzongaro commented Aug 1, 2019

Removed WIP prefix, because as Andrew mentioned, the bug this change exposes has been fixed.

@andrewcraik
Copy link
Contributor

@hzongaro I don't think we should merge this until the new Pre/Post are added to the opt strategy in this commit - the danger is that we would keep running regular EA and ignore the 'real' prepares thinking the fake prepares will save us, but we won't have added them

@andrewcraik andrewcraik changed the title Improve Escape Analysis under OSR WIP: Improve Escape Analysis under OSR Aug 1, 2019
@hzongaro
Copy link
Member Author

hzongaro commented Aug 1, 2019

@hzongaro I don't think we should merge this until the new Pre/Post are added to the opt strategy in this commit - the danger is that we would keep running regular EA and ignore the 'real' prepares thinking the fake prepares will save us, but we won't have added them

Thanks, Andrew @andrewcraik. So I think that means that I need to separate out the part of this change that adds PreEscapeAnalysis and PostEscapeAnalysis. Then after that's merged, OMR pull request #3842, which adds them to the optimization strategy can be merged, and finally, the remainder of this change, with the actual EA changes, can be delivered. Do I have that right?

@andrewcraik
Copy link
Contributor

Yes I think so - we can't change main EA until we have pre/post running, but it is safe to run pre/post with existing EA - you just see more escapes (escapes that would have happened anyway)

@hzongaro
Copy link
Member Author

hzongaro commented Aug 1, 2019

Reorganized changes so that PreEscapeAnaysis and PostEscapeAnalysis are available and running when the Escape Analysis changes are delivered, as Andrew recommends above. This makes this change now dependent upon pull request #6654 and OMR pull request #3842.

@hzongaro hzongaro force-pushed the ea-staging-branch-step2 branch 2 times, most recently from 394c9ea to f3f7383 Compare August 9, 2019 15:14
@hzongaro
Copy link
Member Author

hzongaro commented Aug 9, 2019

Just force pushed to bring the ea-staging-branch-step2 up to date with changes from pull request #6654 - no substantive changes here.

@andrewcraik
Copy link
Contributor

@hzongaro now that #6654 has merged can you rebase so we can sanity this one?

The preEscapeAnalysis optimization, which was introduced in a previous
change, looks for calls to OSRInductionHelper and adds a fake
prepareForOSR call that references all live auto symrefs and pending
pushes.  Any object that ends up as a candidate for stack allocation
that appears to be used by such a fake prepareForOSR call can be
heapified at that point by Escape Analysis.

During EA itself, those references on prepareForOSR calls are marked
as ignoreable for the purposes of determining whether an object can be
stack allocated.

Submitted on behalf of Andrew Craik <ajcraik@ca.ibm.com>

Signed-off-by:  Henry Zongaro <zongaro@ca.ibm.com>
The code in findIgnoreableUses that populates the TR_BitVector,
_ignoreableUses, uses the global indices of nodes to index the
TR_BitVector.  Code in checkDefsAndUses was using the value numbers of
nodes to check the entries in _ignoreableUses.

Fixed checkDefsAndUses to use nodes' global indices instead.  Also,
reorganized the code checking _ignoreableUses to pull the invariant
check out of a loop.

Signed-off-by:  Henry Zongaro <zongaro@ca.ibm.com>
@hzongaro
Copy link
Member Author

Andrew @andrewcraik, I've brought this branch up to date with the latest changes in OpenJ9. Before this change is sanity tested, we'll need OMR pull request 3842 to make it through the OpenJ9 OMR acceptance testing. I will leave this marked as a work in progress until then.

@andrewcraik
Copy link
Contributor

@hzongaro I believe 3842 has merged and been accepted. If you could confirm and un-WIP I'll restart testing on this change.

@hzongaro hzongaro changed the title WIP: Improve Escape Analysis under OSR Improve Escape Analysis under OSR Aug 27, 2019
@hzongaro
Copy link
Member Author

Removed WIP prefix now that OMR pull request 3842 has been merged and passed OpenJ9 OMR acceptance testing.

@andrewcraik
Copy link
Contributor

Jenkins test sanity all jdk8,jdk11

@andrewcraik andrewcraik merged commit 31fcc88 into eclipse-openj9:master Sep 4, 2019
@hzongaro hzongaro deleted the ea-staging-branch-step2 branch September 4, 2019 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants