Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use getLiveRangeInfo to find pending push symRefs that are dead #14074

Conversation

hzongaro
Copy link
Member

@hzongaro hzongaro commented Dec 6, 2021

The calls to TR_EscapeAnalysisTools::processSymbolReferences load references to symRefs that are live in order to create fake escapes for those symRefs at that point. Determining whether a symRef is live relies on a TR_BitVector that lists the symRefs that are dead at that point in the IL. For autos, that TR_BitVector was being retrieved by calling TR_OSRMethodData::getLiveRangeInfo(), but for pending push symRefs, TR_OSRMethodData::getPendingPushLivenessInfo() was being called. When OSRDefAnalysis builds the vector of dead symRefs for both autos and pending pushes, it stores them in the LiveRangeInfo of the TR_OSRMethodData; the PendingPushLivenessInfo actually contains information about which pending push symRefs are live at particular points, not those that are dead.

This change corrects the calls to processSymbolReferences by using the result of TR_OSRMethodData::getLiveRangeInfo() for both autos and for pending pushes.

Also added some additional logging that's enabled under either traceOSR or traceEscapeAnalysis.

@hzongaro
Copy link
Member Author

hzongaro commented Dec 6, 2021

I'm leaving this as a draft pull request for now while I investigate some personal build failures.

@hzongaro hzongaro marked this pull request as ready for review December 6, 2021 21:17
@hzongaro hzongaro force-pushed the pending-push-liveness-for-eaEscapeHelper branch from ea3ee90 to 189fea9 Compare May 20, 2022 21:24
@hzongaro
Copy link
Member Author

Just force pushed a change to update the copyright year - no other changes.

Vijay @vijaysun-omr, I had forgotten about this fix. Did you have any other concerns about it?

@vijaysun-omr
Copy link
Contributor

No objection but my impression was that this was discovered by some test failure (in which case I'm unclear on how you did not receive pressure to merge this sooner).

@vijaysun-omr
Copy link
Contributor

Jenkins test sanity all jdk17

@hzongaro
Copy link
Member Author

my impression was that this was discovered by some test failure

Right - this fixes what appears to be the root of the problem reported in issue #13162, which was worked around in OMR pull request #6255. The presence of that work around reduced the need for this fix.

That naturally raises the question whether that work around should remain in place.

@vijaysun-omr
Copy link
Contributor

Okay that workaround answers the question that I had, thanks!

I feel that workaround was probably a fair change anyway since we probably did not want to risk doing an OSR transition if a profiled guard failed (or at the very least there ought to be a motivating case for why that is needed). My tendency would therefore be to keep that workaround in place even with your fix but I would prefer @0xdaryl to weigh in as well.

One suggestion for testing out your change more (especially given the workaround reduces the likelihood of OSR transitions happening by default) is to do some internal/private testing with the -Xjit option enableOSROnGuardFailure to stress the OSR code paths.

The calls to processSymbolReferences load references to symRefs that are
live in order to create fake escapes for those symRefs at that point.
Determining whether a symRef is live relies on a TR_BitVector that lists
the symRefs that are dead at that point in the IL.  For autos, that
TR_BitVector was being retrieved by calling
TR_OSRMethodData::getLiveRangeInfo(), but for pending push symRefs,
TR_OSRMethodData::getPendingPushLivenessInfo() was being called.  When
OSRDefAnalysis builds the vector of dead symRefs for both autos and
pending pushes, it stores them in the LiveRangeInfo of the
TR_OSRMethodData; the PendingPushLivenessInfo actually contains
information about which pending push symRefs are live at particular
points, not those that are dead.

This change corrects the calls to processSymbolReferences by using the
result of TR_OSRMethodData::getLiveRangeInfo() for both autos and for
pending pushes.

Also added some additional logging that's enabled under either traceOSR
or traceEscapeAnalysis.

Signed-off-by:  Henry Zongaro <zongaro@ca.ibm.com>
@hzongaro hzongaro force-pushed the pending-push-liveness-for-eaEscapeHelper branch from 189fea9 to 3a635d3 Compare October 3, 2023 16:00
@hzongaro
Copy link
Member Author

hzongaro commented Oct 3, 2023

@vijaysun-omr, after leaving it sit dormant for the past year, I've just come back to this pull request again as @jdmpapin and @JamesKingdon saw some strong evidence that the problem with live pending push symrefs missing from calls to eaEscapeHelper is the cause of issue #8972.

The recent change that I force pushed simply rebased the fix onto a more recent version of OpenJ9 - the changes are otherwise the same as those you had previously reviewed.

One suggestion for testing out your change more (especially given the workaround reduces the likelihood of OSR transitions happening by default) is to do some internal/private testing with the -Xjit option enableOSROnGuardFailure to stress the OSR code paths

I attempted some test runs with -Xjit:enableOSROnGuardFailure as you suggested. The only failures I saw were cases that also failed without this change.

May I trouble you to review this change again?

@vijaysun-omr
Copy link
Contributor

Jenkins test sanity all jdk17

@hzongaro
Copy link
Member Author

hzongaro commented Oct 4, 2023

The sanity.functional x86-64 mac failure is known issue #18076.

The sanity.functional x86-64 linux failure appears to be an infrastructure problem. I see the following in one of the console logs. I will restart it.

23:05:18  Cannot contact cent7-x64-6: java.lang.InterruptedException

The sanity.functional ppc64le failure failures look like known issue #17457. I will restart it as well.

  • cmdLineTester_criu_nonPortableRestore_2
  • cmdLineTester_criu_nonPortableRestore_Xtrace_outputFile_0
 [OUT] Performing CRIUSupport.checkpointJVM(), current thread name: main, Tue Oct 03 21:11:15 EDT 2023, System.currentTimeMillis(): 1696381875513, System.nanoTime(): 13342951632095843
 [OUT] Exception in thread "main" org.eclipse.openj9.criu.SystemCheckpointException: Could not dump the JVM processes, err=-52
 [OUT] 	at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVMImpl(Native Method)
 [OUT] 	at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(CRIUSupport.java:811)
 [OUT] 	at org.openj9.criu.CRIUTestUtils.checkPointJVM(CRIUTestUtils.java:77)
 [OUT] 	at org.openj9.criu.OptionsFileTest.dumpOptionsTestRequireDynamic(OptionsFileTest.java:295)
 [OUT] 	at org.openj9.criu.OptionsFileTest.main(OptionsFileTest.java:69)
 [OUT] Error (criu/protobuf.c:72): Unexpected EOF on (empty-image)
 [OUT] Removed test output files
 [OUT] finished script

Jenkins test sanity xlinux,plinux jdk17

@hzongaro
Copy link
Member Author

hzongaro commented Oct 5, 2023

Jenkins test sanity osx jdk17

@vijaysun-omr
Copy link
Contributor

I (re)reviewed this and the changes are mostly better tracing except for the core change described in the PR abstract. Tests have passed. So I am merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants