Skip to content

Commit 96646f7

Browse files
Faye GaoFei Gao
authored andcommitted
8320379: C2: Sort spilling/unspilling sequence for better ld/st merging into ldp/stp on AArch64
Macro-assembler on aarch64 can merge adjacent loads or stores into ldp/stp[1]. For example, it can merge: ``` str w20, [sp, openjdk#16] str w10, [sp, openjdk#20] ``` into ``` stp w20, w10, [sp, openjdk#16] ``` But C2 may generate a sequence like: ``` str x21, [sp, openjdk#8] str w20, [sp, openjdk#16] str x19, [sp, openjdk#24] <--- str w10, [sp, openjdk#20] <--- Before sorting str x11, [sp, openjdk#40] str w13, [sp, openjdk#48] str x16, [sp, openjdk#56] ``` We can't do any merging for non-adjacent loads or stores. The patch is to sort the spilling or unspilling sequence in the order of offset during instruction scheduling and bundling phase. After that, we can get a new sequence: ``` str x21, [sp, openjdk#8] str w20, [sp, openjdk#16] str w10, [sp, openjdk#20] <--- str x19, [sp, openjdk#24] <--- After sorting str x11, [sp, openjdk#40] str w13, [sp, openjdk#48] str x16, [sp, openjdk#56] ``` Then macro-assembler can do ld/st merging: ``` str x21, [sp, openjdk#8] stp w20, w10, [sp, openjdk#16] <--- Merged str x19, [sp, openjdk#24] str x11, [sp, openjdk#40] str w13, [sp, openjdk#48] str x16, [sp, openjdk#56] ``` To justify the patch, we run `HelloWorld.java` ``` public class HelloWorld { public static void main(String [] args) { System.out.println("Hello World!"); } } ``` with `java -Xcomp -XX:-TieredCompilation HelloWorld`. Before the patch, macro-assembler can do ld/st merging for 3688 times. After the patch, the number of ld/st merging increases to 3871 times, by ~5 %. Tested tier1~3 on x86 and AArch64. [1] https://github.com/openjdk/jdk/blob/a95062b39a431b4937ab6e9e73de4d2b8ea1ac49/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L2079
1 parent 179f505 commit 96646f7

File tree

1 file changed

+41
-3
lines changed

1 file changed

+41
-3
lines changed

src/hotspot/share/opto/output.cpp

Lines changed: 41 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,12 @@ class Scheduling {
169169
// Add a node to the current bundle
170170
void AddNodeToBundle(Node *n, const Block *bb);
171171

172+
// Return true only when the stack offset of the first spill node is
173+
// greater than the stack offset of the second one. Otherwise, return false.
174+
// When compare_two_spill_nodes(first, second) returns true, we think that
175+
// "second" should be scheduled before "first" in the final basic block.
176+
bool compare_two_spill_nodes(Node* first, Node* second);
177+
172178
// Add a node to the list of available nodes
173179
void AddNodeToAvailableList(Node *n);
174180

@@ -2271,6 +2277,29 @@ Node * Scheduling::ChooseNodeToBundle() {
22712277
return _available[0];
22722278
}
22732279

2280+
bool Scheduling::compare_two_spill_nodes(Node* first, Node* second) {
2281+
assert(first->is_MachSpillCopy() && second->is_MachSpillCopy(), "");
2282+
2283+
OptoReg::Name first_src_lo = _regalloc->get_reg_first(first->in(1));
2284+
OptoReg::Name first_dst_lo = _regalloc->get_reg_first(first);
2285+
OptoReg::Name second_src_lo = _regalloc->get_reg_first(second->in(1));
2286+
OptoReg::Name second_dst_lo = _regalloc->get_reg_first(second);
2287+
2288+
// Comparison between stack -> reg and stack -> reg
2289+
if (OptoReg::is_stack(first_src_lo) && OptoReg::is_stack(second_src_lo) &&
2290+
OptoReg::is_reg(first_dst_lo) && OptoReg::is_reg(second_dst_lo)) {
2291+
return _regalloc->reg2offset(first_src_lo) > _regalloc->reg2offset(second_src_lo);
2292+
}
2293+
2294+
// Comparison between reg -> stack and reg -> stack
2295+
if (OptoReg::is_stack(first_dst_lo) && OptoReg::is_stack(second_dst_lo) &&
2296+
OptoReg::is_reg(first_src_lo) && OptoReg::is_reg(second_src_lo)) {
2297+
return _regalloc->reg2offset(first_dst_lo) > _regalloc->reg2offset(second_dst_lo);
2298+
}
2299+
2300+
return false;
2301+
}
2302+
22742303
void Scheduling::AddNodeToAvailableList(Node *n) {
22752304
assert( !n->is_Proj(), "projections never directly made available" );
22762305
#ifndef PRODUCT
@@ -2282,11 +2311,20 @@ void Scheduling::AddNodeToAvailableList(Node *n) {
22822311

22832312
int latency = _current_latency[n->_idx];
22842313

2285-
// Insert in latency order (insertion sort)
2314+
// Insert in latency order (insertion sort). If two MachSpillCopyNodes
2315+
// for stack spilling or unspilling have the same latency, we sort
2316+
// them in the order of stack offset. Some backends (aarch64) may also
2317+
// have more opportunities to do ld/st merging
22862318
uint i;
2287-
for ( i=0; i < _available.size(); i++ )
2288-
if (_current_latency[_available[i]->_idx] > latency)
2319+
for (i = 0; i < _available.size(); i++) {
2320+
if (_current_latency[_available[i]->_idx] > latency) {
22892321
break;
2322+
} else if (_current_latency[_available[i]->_idx] == latency &&
2323+
n->is_MachSpillCopy() && _available[i]->is_MachSpillCopy() &&
2324+
compare_two_spill_nodes(n, _available[i])) {
2325+
break;
2326+
}
2327+
}
22902328

22912329
// Special Check for compares following branches
22922330
if( n->is_Mach() && _scheduled.size() > 0 ) {

0 commit comments

Comments
 (0)