-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8290892: C2: Intrinsify Reference.reachabilityFence #25315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
👋 Welcome back vlivanov! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
@iwanowww |
/contributor add tholenstein |
@iwanowww |
/contributor add vlivanov |
@iwanowww |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an impressive piece of work. It still makes me yearn for simpler solution. (I am impressed how much it snowballed from my original @DontInline
implementation!)
Representing ReachabilityFence as memory barrier (e.g., MemBarCPUOrder) would solve the issue, but performance costs are prohibitively high.
How bad is it? MemBarCPUOrder
pinches all memory, so I assume this breaks a lot of optimizations when RF
is sitting in the hot loop? I remember we went through a similar exercise with Blackholes
: JDK-8296545 -- and decided to pinch only the control. I guessing this is not enough to fix RF, or is it?
Some other comments:
@@ -659,12 +659,10 @@ protected Object clone() throws CloneNotSupportedException { | |||
* {@code null}, this method has no effect. | |||
* @since 9 | |||
*/ | |||
@ForceInline | |||
@IntrinsicCandidate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like we also want to restore @DontInline
to cover the case when intrinsic is not available / disabled for some compiler. I vaguely remember some intrinsic handling code checks whether method is prohibited from inlining (maybe affects only global -XX:-Inline
, not sure), so it might be as straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to use -XX:DisableIntrinsic=_Reference_reachabilityFence
to switch to current behavior (no fence).
Also, @DontInline
would require special handling in C1 to unconditionally inline it.
@ForceInline
was there primarily to communicate the interaction with JVM. (Existing inlining heuristics should just unconditionally inline empty methods.) Once @IntrinsicCandidate
is there, I don't see much value in any other annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the whole point of this PR is that "current behavior" is incorrect, isn't it?
I think disabling _Reference_reachabilityFence
intrinsic (or, failing to inline the intrinsic for some other reason) should fail-safe to non-inlined method, not fail-deadly to a broken RF. In other words, let's not rely on intrinsic to work for correctness; non-intrinsified version should be correct as well.
I agree @DontInline
would require a bit of extra fiddling in C1, but I suspect it should be as easy as copy-pasting a few hunks around LIRGenerator::do_blackhole
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the whole point of this PR is that "current behavior" is incorrect, isn't it?
Strictly speaking, current implementation has a defect and it requires a complete rewrite on C2 side to properly fix it.
Current implementation is part of JDK for a long time (since 11). It's highly unlikely it'll be backported all the way to JDK 11 and it's an open question whether it should be backported at all. So, for diagnostic purposes it makes sense to provide a way to compare old and new implementations irrespective of whether old implementation still has the bug.
In other words, let's not rely on intrinsic to work for correctness; non-intrinsified version should be correct as well.
A question for you: do you think we should test non-intrinsified case?
Personally, I consider such requirement as way too strong. In this particular case, the method is unconditionally intrinsified in C2. If no intrinsification takes place, it's a bug. (I'm fine with adding an assert & abort compilation if C2 ever observes Reference.reachabilityFence()
to be inlined.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think two general principles apply here: a) intrinsics are performance optimizations, not correctness building blocks; b) when forced to choose, we prefer correctness over performance.
This is not about the backports, but about having the correct fallback when we need it. Imagine, for a second, this fix has a major bug discovered later. We instruct users to disable the intrinsic to avoid that bug. Users then ask: "Is it safe to disable the intrinsic? Would my application crash/misbehave without it? Would it run slower?". Right now we have a choice what we can answer. With @DontInline
, we say "Yes, it would run slower, but RF would still work against misbehaviors". Without @DontInline
, we say "Yes, there is a possibility of misbehavior, but at least it would run as fast". One of these answers goes against the general principle (b).
I also cannot remember any existing intrinsic that is a counter-example for (a). Without @DontInline
, RF would be a first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a) intrinsics are performance optimizations, not correctness building blocks;
The fact that many intrinsics are performance optimizations doesn't mean all intrinsics should be.
I also cannot remember any existing intrinsic that is a counter-example for (a)
Reference::get
.
This is not about the backports, but about having the correct fallback when we need it.
I provided it as a justification why we may need access to current implementation even if we know it's not 100% correct.
Users then ask: "Is it safe to disable the intrinsic? Would my application crash/misbehave without it? Would it run slower?"
There's no single answer possible here and it depending on what users expect. If there's a severe regression after the fix, as the stop-the-gap solution, it's common to restore original behavior users had before the change arrived (and that's -XX:DisableIntrinsic=_Reference_reachabilityFence
). If somebody asks for a workaround which is 100% correct, it can be suggested to also disable method inlining (-XX:CompileCommand=dontinline,java.lang.ref.Reference::reachabilityFence
), but it won't be the mode they run before and it'll introduce additional risks for them (mostly, performance-wise).
So, there's a way to achieve both. Also, -XX:DisableIntrinsic=_Reference_reachabilityFence
covers the most common scenario we may need and it's the simplest one from implementation POV.
// Skip next transformation if compressed oops are not used. | ||
if ((UseCompressedOops && !Matcher::gen_narrow_oop_implicit_null_checks()) || | ||
(!UseCompressedOops && !UseCompressedClassPointers)) | ||
return; | ||
|
||
// Go over ReachabilityFence nodes to skip DecodeN nodes for referents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a cute optimization. Does it happen in our code anywhere? I would have expected DecodeN
to be near the heap loads, and suppose RF
is mostly called on locals, which are already uncompressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I read the next hunk, should is_DecodeN
be is_DecodeNarrowPtr
to capture class loads (however unlikely that one is)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReachabilityFence
accepts only OOPs as a referent and DecodeNKlass
produces Klass
pointer.
I suspect it may be the case for safepoints as well (and is_DecodeNarrowPtr()
is a a leftover from PermGen world), but I didn't check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, nevermind about DecodeNKlass
then. My question on heap loads still stands: do we actually get reachabilityFence(someField)
from anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking specifically about ReachabilityFence -> DecodeN -> LoadN
shape? Yes, it's common, especially after inlining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iwanowww Can you add a code comment why this is safe to look through the ReachabilityFence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
redundant_rfs.push(rf); | ||
} | ||
} | ||
return found; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
found
is always false
here. I don't think this function even needs a return value, judging by the uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention was to signal when not-yet-known redundant fences are found.
I prefer to keep it even if it is not used right now.
Fixed by updating found
along with pushing newly discovered redundant node on the list.
} | ||
} | ||
} | ||
redundant_rfs.push(rf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see PhaseIdealLoop::optimize_reachability_fences
asks for redundant_rfs.member(rf)
before going into this analysis. Is it because we can have duplicate RFs in the C->reachability_fence
list? Can we have duplicate here? Should we check .member(rf)
here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No duplicated entries are allowed in Compile::_reachability_fences
.
eliminate_reachability_fences()
is called at the end of loop opts after optimize_reachability_fences()
is done (possibly, multiple times).
eliminate_reachability_fences()
migrates all referents from RFs to safepoints. So, every RF node is placed on redundant_rfs
list and there are no RF nodes left after eliminate_reachability_fences()
is over (there's C->reachability_fences_count() == 0
assert at the end).
After safepoints are directly linked to the referent, the corresponding RF node becomes redundant. So, I kept redundant_rfs
as a name. I can choose a different one if you find it confusing.
Another important aspect when it comes to determining RF redundancy is that eliminate_reachability_fences()
takes all users into account while optimize_reachability_fences()
trusts only ReachabilityFence
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, Aleksey, Tobias, and Christian.
redundant_rfs.push(rf); | ||
} | ||
} | ||
return found; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention was to signal when not-yet-known redundant fences are found.
I prefer to keep it even if it is not used right now.
Fixed by updating found
along with pushing newly discovered redundant node on the list.
} | ||
} | ||
} | ||
redundant_rfs.push(rf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No duplicated entries are allowed in Compile::_reachability_fences
.
eliminate_reachability_fences()
is called at the end of loop opts after optimize_reachability_fences()
is done (possibly, multiple times).
eliminate_reachability_fences()
migrates all referents from RFs to safepoints. So, every RF node is placed on redundant_rfs
list and there are no RF nodes left after eliminate_reachability_fences()
is over (there's C->reachability_fences_count() == 0
assert at the end).
After safepoints are directly linked to the referent, the corresponding RF node becomes redundant. So, I kept redundant_rfs
as a name. I can choose a different one if you find it confusing.
Another important aspect when it comes to determining RF redundancy is that eliminate_reachability_fences()
takes all users into account while optimize_reachability_fences()
trusts only ReachabilityFence
s.
@@ -659,12 +659,10 @@ protected Object clone() throws CloneNotSupportedException { | |||
* {@code null}, this method has no effect. | |||
* @since 9 | |||
*/ | |||
@ForceInline | |||
@IntrinsicCandidate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to use -XX:DisableIntrinsic=_Reference_reachabilityFence
to switch to current behavior (no fence).
Also, @DontInline
would require special handling in C1 to unconditionally inline it.
@ForceInline
was there primarily to communicate the interaction with JVM. (Existing inlining heuristics should just unconditionally inline empty methods.) Once @IntrinsicCandidate
is there, I don't see much value in any other annotations.
// Skip next transformation if compressed oops are not used. | ||
if ((UseCompressedOops && !Matcher::gen_narrow_oop_implicit_null_checks()) || | ||
(!UseCompressedOops && !UseCompressedClassPointers)) | ||
return; | ||
|
||
// Go over ReachabilityFence nodes to skip DecodeN nodes for referents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReachabilityFence
accepts only OOPs as a referent and DecodeNKlass
produces Klass
pointer.
I suspect it may be the case for safepoints as well (and is_DecodeNarrowPtr()
is a a leftover from PermGen world), but I didn't check.
src/hotspot/share/opto/compile.cpp
Outdated
@@ -3897,6 +3908,20 @@ void Compile::final_graph_reshaping_main_switch(Node* n, Final_Reshape_Counts& f | |||
//------------------------------final_graph_reshaping_walk--------------------- | |||
// Replacing Opaque nodes with their input in final_graph_reshaping_impl(), | |||
// requires that the walk visits a node's inputs before visiting the node. | |||
|
|||
static bool has_non_debug_uses(Node* n) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it to Node::has_non_debug_uses()
.
Yes, if a barrier stays inside loop body, it breaks a lot of important optimizations. It may end up almost as bad as a full-blown call (except a barrier can be moved around while a call can't). And moving a node when it depends both on control and memory is more complicated than just a CFG node. Moreover, as you can see in the proposed solution, even CFG-only representation is problematic for loop opts, so additional care is needed to ensure RFs are moved out of loops. As an alternative approach, I thought about reifying RF as a data node (think of |
product(bool, OptimizeReachabilityFences, true, DIAGNOSTIC, \ | ||
"Optimize reachability fences") \ | ||
\ | ||
product(bool, PreserveReachabilityFencesOnConstants, false, DIAGNOSTIC, \ | ||
"Preserve reachability fences on constants") \ | ||
\ | ||
product(bool, StressReachabilityFences, false, DIAGNOSTIC, \ | ||
"Randomly insert ReachabilityFence nodes") \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by sniping: what about a hello-world test where you test out these flags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Added one.
* @bug 8290892 | ||
* @summary Tests to ensure that reachabilityFence() correctly keeps objects from being collected prematurely. | ||
* @modules java.base/jdk.internal.misc | ||
* @run main/othervm -Xbatch compiler.c2.TestReachabilityFence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about some extra runs where you use your new flags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular test is carefully crafted to provoke a failure when reachability fence effects aren't properly modeled. Stressing RF implementation doesn't help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on my first pass through the code, just leaving a few first observations / comments :)
|
||
uint endoff = call->jvms()->endoff(); | ||
if (C->inlining_incrementally()) { | ||
assert(endoff == call->req(), ""); // assert in SafePointNode::grow_stack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly are you asserting here? And what is the comment for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assert ensures there are no reachability edges present when incremental inlining takes place. Inlining logic doesn't expect any extra edges past debug info and the comment refers to the assert which fires the first.
assert(endoff == call->req(), ""); // assert in SafePointNode::grow_stack | ||
} else { | ||
if (call->req() > endoff) { | ||
assert(OptimizeReachabilityFences, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a helpful assert message?
src/hotspot/share/opto/callnode.cpp
Outdated
case CatchProjNode::fall_through_index: projs->fallthrough_catchproj = cpn; break; | ||
case CatchProjNode::catch_all_index: projs->catchall_catchproj = cpn; break; | ||
default: { | ||
assert(cpn->_con > 1, ""); // exception table; rethrow case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please turn this into a helpful assert message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you quickly comment why you changed this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some call nodes inspected during expand_reachability_fences
demonstrate this IR shape where some exception table projections are directly attached to the call node.
Looks like a missed case in CallNode::extract_projections
we simply never hit before.
// Are we guaranteed that this node is a safepoint? Not true for leaf calls and | ||
// for some macro nodes whose expansion does not have a safepoint on the fast path. | ||
virtual bool guaranteed_safepoint() { return true; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you only copied it. It makes me a little nervous when we call the "default" case safe. Because when you add more cases, you just assume it is safe... and if it is not we first have to discover that through a bug. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's a SafePointNode class after all. I lifted it from CallNode
subclass to avoid elaborate check on SafePoint nodes (!is_Call() || as_Call() && guaranteed_safepoint()`)).
If some node extends SafePointNode, but doesn't keep JVM state, it has to communicate it to users one way or another. And changing the default doesn't improve the situation IMO: reporting a safepoint node as a non-safepoint is still a bug.
src/hotspot/share/opto/compile.cpp
Outdated
{ // No more loop opts. It is safe to eliminate reachability fence nodes. | ||
TracePhase tp(_t_idealLoop); | ||
PhaseIdealLoop::optimize(igvn, LoopOptsEliminateRFs); | ||
print_method(PHASE_ELIMINATE_REACHABILITY_FENCES, 2); | ||
if (failing()) return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will have to check the impact on compile time here. Running an extra round of loop opts might be significant.
Do you really need to use a loop opts phase for this? Or would something like process_for_post_loop_opts_igvn
work for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be helpful if you write in a comment if this eliminates all or just some of the reachability fences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we limit it to cases where we actually have reachability fences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Done.
src/hotspot/share/opto/compile.cpp
Outdated
Node* rf = C->reachability_fence(i); | ||
Node* in = rf->in(1); | ||
if (in->is_DecodeN()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not:
Node* rf = C->reachability_fence(i); | |
Node* in = rf->in(1); | |
if (in->is_DecodeN()) { | |
ReachabilityFence* rf = C->reachability_fence(i); | |
DecodeNNode* dn = rf->in(1)->isa_DecodeN(); | |
if (dn != nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, reshaped as you suggested.
// Skip next transformation if compressed oops are not used. | ||
if ((UseCompressedOops && !Matcher::gen_narrow_oop_implicit_null_checks()) || | ||
(!UseCompressedOops && !UseCompressedClassPointers)) | ||
return; | ||
|
||
// Go over ReachabilityFence nodes to skip DecodeN nodes for referents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iwanowww Can you add a code comment why this is safe to look through the ReachabilityFence?
src/hotspot/share/opto/compile.hpp
Outdated
@@ -377,6 +378,7 @@ class Compile : public Phase { | |||
// of Template Assertion Predicates themselves. | |||
GrowableArray<OpaqueTemplateAssertionPredicateNode*> _template_assertion_predicate_opaques; | |||
GrowableArray<Node*> _expensive_nodes; // List of nodes that are expensive to compute and that we'd better not let the GVN freely common | |||
GrowableArray<Node*> _reachability_fences; // List of reachability fences |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not:
GrowableArray<Node*> _reachability_fences; // List of reachability fences | |
GrowableArray<ReachabilityFenceNode*> _reachability_fences; // List of all reachability fences |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done.
src/hotspot/share/opto/compile.hpp
Outdated
void add_reachability_fence(Node *n) { | ||
_reachability_fences.append(n); | ||
} | ||
|
||
void remove_reachability_fence(Node* n) { | ||
_reachability_fences.remove_if_existing(n); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also add the type ReachabilityFenceNode*
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
//------------------------------unique_loop_exit_or_null---------------------- | ||
// Return the loop-exit projection if it is unique. | ||
Node* IdealLoopTree::unique_loop_exit_or_null() { | ||
Node* unique_loop_exit = nullptr; | ||
if (is_loop()) { | ||
if (head()->is_BaseCountedLoop()) { | ||
unique_loop_exit = head()->as_BaseCountedLoop()->loopexit()->proj_out_or_null(0 /* false */); | ||
} else if (head()->is_OuterStripMinedLoop()) { | ||
unique_loop_exit = head()->as_OuterStripMinedLoop()->outer_loop_exit(); | ||
} else { | ||
// For now, conservatively report multiple loop exits exist. | ||
} | ||
} | ||
return unique_loop_exit; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proj_out_or_null
returns a ProjNode
(it is probably a IfTrue
or IfFalse
, right?) and outer_loop_exit
returns a IfFalseNode
. So we should be able to return a IfProjNode
from this method. What do you think?
What is the benefit of the unique_loop_exit
variable here? Why not return immediately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was easier to inspect it in the debugger. Reshaped as you suggested.
src/hotspot/share/opto/loopnode.hpp
Outdated
bool optimize_reachability_fences(); | ||
bool eliminate_reachability_fences(); | ||
|
||
bool is_redundant_rf(Node* rf, bool rf_only); | ||
bool find_redundant_rfs(Unique_Node_List& redundant_rfs); | ||
void insert_rf(Node* ctrl, Node* referent); | ||
void replace_rf(Node* old_node, Node* new_node); | ||
void remove_rf(Node* rf); | ||
#ifdef ASSERT | ||
bool has_redundant_rfs(Unique_Node_List& ignored_rfs, bool rf_only); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if all the method names spelled out reachability_fences
instead of rf / rfs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arguments are less important for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 types of methods here: internal ones (used solely in reachability.cpp
) and those which are called from loop optimization code (optimize_reachability_fences
and eliminate_reachability_fences
).
IMO it's counter-productive to repeatedly spell out what "RF" means inside reachability.cpp
, so I kept the names intact. I split the declarations into public and private ones to stress the distinction.
src/hotspot/share/opto/macro.cpp
Outdated
@@ -977,6 +979,8 @@ void PhaseMacroExpand::process_users_of_allocation(CallNode *alloc) { | |||
} | |||
} | |||
_igvn._worklist.push(ac); | |||
} else if (use->is_ReachabilityFence() && OptimizeReachabilityFences) { | |||
_igvn.replace_input_of(use, 1, _igvn.makecon(TypePtr::NULL_PTR)); // reset; redundant fence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you quickly explain in a code comment how this does a "reset"? What happens with it next?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turned it into ReachabilityFenceNode::clear_referent()
. Hope it makes it clearer.
src/hotspot/share/opto/node.hpp
Outdated
DEFINE_CLASS_ID(Initialize, MemBar, 0) | ||
DEFINE_CLASS_ID(MemBarStoreStore, MemBar, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEFINE_CLASS_ID(Initialize, MemBar, 0) | |
DEFINE_CLASS_ID(MemBarStoreStore, MemBar, 1) | |
DEFINE_CLASS_ID(Initialize, MemBar, 0) | |
DEFINE_CLASS_ID(MemBarStoreStore, MemBar, 1) |
I don't think you needed to touch the lines above, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -358,6 +358,7 @@ class Parse : public GraphKit { | |||
bool _wrote_stable; // Did we write a @Stable field? | |||
bool _wrote_fields; // Did we write any field? | |||
Node* _alloc_with_final_or_stable; // An allocation node with final or @Stable field | |||
Node* _stress_rf_hook; // StressReachabilityFences support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could write out the rf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to avoid that. _stress_reachability_fence_hook
is way too verbose IMO. The declaration and all the accesses are accompanied by StressReachabilityFences
which should make it clear what rf
refers to.
src/hotspot/share/opto/parse1.cpp
Outdated
if (StressReachabilityFences && type->isa_oopptr() != nullptr) { | ||
Node* loc = local(index); | ||
if (loc->bottom_type() != TypePtr::NULL_PTR) { | ||
assert(loc->bottom_type()->isa_oopptr() != nullptr, "%s", Type::str(loc->bottom_type())); | ||
_stress_rf_hook->add_req(loc); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a short code comment describing what you are doing here, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/hotspot/share/opto/parse1.cpp
Outdated
if (StressReachabilityFences && type->isa_oopptr() != nullptr) { | ||
Node* stk = stack(index); | ||
if (stk->bottom_type() != TypePtr::NULL_PTR) { | ||
assert(stk->bottom_type()->isa_oopptr() != nullptr, "%s", Type::str(stk->bottom_type())); | ||
_stress_rf_hook->add_req(stk); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A short code comment would be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/hotspot/share/opto/parse1.cpp
Outdated
@@ -2187,6 +2218,15 @@ void Parse::return_current(Node* value) { | |||
call_register_finalizer(); | |||
} | |||
|
|||
if (StressReachabilityFences) { | |||
// Keep all oop arguments alive until method return. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Can you extend the comment a little?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Does it look better now?
* (1) optimization pass during loop opts which eliminates redundant nodes and | ||
* moves loop-invariant ones outside loops; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* (1) optimization pass during loop opts which eliminates redundant nodes and | |
* moves loop-invariant ones outside loops; | |
* (1) optimization pass during loop opts which eliminates redundant nodes and | |
* moves loop-invariant ones outside loops; |
I'd prever consistent indentation, but optional/question of taste
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
* (2) reachability information is transferred to safepoint nodes (appended as edges after debug info); | ||
* (3) reachability information from safepoints materialized as RF nodes attached to the safepoint node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand the explanation a little, please? I don't really understand. Why do you do this? What does it achieve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be helpful if you wrote a paragraph (maybe at the top), about the interaction of SafePoint and ReachabilityFence. And you should also define "reachability information", I don't yet understand what that entails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I elaborated the description a bit and added more details. Let me know how it reads now.
* It looks attractive to get rid of RF nodes early and transfer to safepoint-attached representation, | ||
* but it is not correct until loop opts are done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it not correct? What could go wrong? Why is it safe to do it after loop opts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Live ranges of values are routinely extended during loop opts. And it can break the invariant that all interfering safepoints contain the referent in their oop map. (If an interfering safepoint doesn't keep the referent alive, then it becomes possible for the referent to be prematurely GCed.)
After loop opts are over, it becomes possible to reliably enumerate all interfering safe points and ensure the referent present in their oop maps.
* RF nodes may interfere with RA, so stand-alone RF nodes are eliminated and reachability information is | ||
* transferred to corresponding safepoints. When safepoints are pruned during macro expansion, corresponding | ||
* reachability info also goes away. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok that this info goes away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this what could currently go wrong with your implementation, or what would go wrong if we only used Safepoints?
@iwanowww I reviewed half of the code. I left lots of minor comments, about code style and little code comments. They are not super important now. I think most important to work on now is the description on the top of |
@iwanowww This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a |
/keepalive |
@iwanowww The pull request is being re-evaluated and the inactivity timeout has been reset. |
@iwanowww this pull request can not be integrated into git checkout 8290892.rf
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
@iwanowww Let me know whenever this is ready to review again 😊 |
@eme64 please, take another look. Thanks! |
This PR introduces C2 support for
Reference.reachabilityFence()
.After JDK-8199462 went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for
Reference.reachabilityFence()
in C2. C1 is left intact for now, because there are no signs yet it is affected.Reference.reachabilityFence()
can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality.Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since
ReachabilityFence
is CFG-only node, nothing explicitly forbids memory operations to float pastReference.reachabilityFence()
and potentially reaching some other safepoints current analysis treats as non-interfering. RepresentingReachabilityFence
as memory barrier (e.g.,MemBarCPUOrder
) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyondReachabilityFence
nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix.Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag
PreserveReachabilityFencesOnConstants
to keep the fences. I plan to address it separately.[1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667
"HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints."
Testing:
Progress
Issue
Contributors
<tholenstein@openjdk.org>
<vlivanov@openjdk.org>
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315
$ git checkout pull/25315
Update a local copy of the PR:
$ git checkout pull/25315
$ git pull https://git.openjdk.org/jdk.git pull/25315/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 25315
View PR using the GUI difftool:
$ git pr show -t 25315
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25315.diff
Using Webrev
Link to Webrev Comment