You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
8320379: C2: Sort spilling/unspilling sequence for better ld/st merging into ldp/stp on AArch64
Macro-assembler on aarch64 can merge adjacent loads or stores
into ldp/stp[1]. For example, it can merge:
```
str w20, [sp, openjdk#16]
str w10, [sp, openjdk#20]
```
into
```
stp w20, w10, [sp, openjdk#16]
```
But C2 may generate a sequence like:
```
str x21, [sp, openjdk#8]
str w20, [sp, openjdk#16]
str x19, [sp, openjdk#24] <---
str w10, [sp, openjdk#20] <--- Before sorting
str x11, [sp, openjdk#40]
str w13, [sp, openjdk#48]
str x16, [sp, openjdk#56]
```
We can't do any merging for non-adjacent loads or stores.
The patch is to sort the spilling or unspilling sequence in
the order of offset during instruction scheduling and bundling
phase. After that, we can get a new sequence:
```
str x21, [sp, openjdk#8]
str w20, [sp, openjdk#16]
str w10, [sp, openjdk#20] <---
str x19, [sp, openjdk#24] <--- After sorting
str x11, [sp, openjdk#40]
str w13, [sp, openjdk#48]
str x16, [sp, openjdk#56]
```
Then macro-assembler can do ld/st merging:
```
str x21, [sp, openjdk#8]
stp w20, w10, [sp, openjdk#16] <--- Merged
str x19, [sp, openjdk#24]
str x11, [sp, openjdk#40]
str w13, [sp, openjdk#48]
str x16, [sp, openjdk#56]
```
To justify the patch, we run `HelloWorld.java`
```
public class HelloWorld {
public static void main(String [] args) {
System.out.println("Hello World!");
}
}
```
with `java -Xcomp -XX:-TieredCompilation HelloWorld`.
Before the patch, macro-assembler can do ld/st merging for
3688 times. After the patch, the number of ld/st merging
increases to 3871 times, by ~5 %.
Tested tier1~3 on x86 and AArch64.
[1] https://github.com/openjdk/jdk/blob/a95062b39a431b4937ab6e9e73de4d2b8ea1ac49/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L2079
0 commit comments