-
Notifications
You must be signed in to change notification settings - Fork 13.9k
[RISCV] Rematerialize vid.v #97520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Rematerialize vid.v #97520
Conversation
This adds initial support for rematerializing vector instructions, starting with vid.v since it's simple and has the least number of operands. It has one passthru operand which we need to check is undefined. RISCVInsertVSETVLI can still happen before vector regalloc if -riscv-vsetvl-after-rvv-regalloc is false, so this makes sure that we only rematerialize after regalloc by checking for the implicit uses that are added.
@llvm/pr-subscribers-backend-risc-v Author: Luke Lau (lukel97) ChangesThis adds initial support for rematerializing vector instructions, starting with vid.v since it's simple and has the least number of operands. It has one passthru operand which we need to check is undefined. RISCVInsertVSETVLI can still happen before vector regalloc if -riscv-vsetvl-after-rvv-regalloc is false, so this makes sure that we only rematerialize after regalloc by checking for the implicit uses that are added. Full diff: https://github.com/llvm/llvm-project/pull/97520.diff 4 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 67e2f3f5d6373..3e3292ccc148a 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -166,6 +166,18 @@ Register RISCVInstrInfo::isStoreToStackSlot(const MachineInstr &MI,
return 0;
}
+bool RISCVInstrInfo::isReallyTriviallyReMaterializable(
+ const MachineInstr &MI) const {
+ if (RISCV::getRVVMCOpcode(MI.getOpcode()) == RISCV::VID_V &&
+ MI.getOperand(1).isUndef() &&
+ /* After RISCVInsertVSETVLI most pseudos will have implicit uses on vl and
+ vtype. Make sure we only rematerialize before RISCVInsertVSETVLI
+ i.e. -riscv-vsetvl-after-rvv-regalloc=true */
+ !MI.hasRegisterImplicitUseOperand(RISCV::VTYPE))
+ return true;
+ return TargetInstrInfo::isReallyTriviallyReMaterializable(MI);
+}
+
static bool forwardCopyWillClobberTuple(unsigned DstReg, unsigned SrcReg,
unsigned NumRegs) {
return DstReg > SrcReg && (DstReg - SrcReg) < NumRegs;
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.h b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
index e069717aaef23..f0c0953a3e56a 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
@@ -76,6 +76,8 @@ class RISCVInstrInfo : public RISCVGenInstrInfo {
Register isStoreToStackSlot(const MachineInstr &MI, int &FrameIndex,
unsigned &MemBytes) const override;
+ bool isReallyTriviallyReMaterializable(const MachineInstr &MI) const override;
+
void copyPhysRegVector(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI, const DebugLoc &DL,
MCRegister DstReg, MCRegister SrcReg, bool KillSrc,
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
index 45a57d1170814..42d6b03968d74 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
@@ -6629,6 +6629,7 @@ defm PseudoVIOTA_M: VPseudoVIOTA_M;
//===----------------------------------------------------------------------===//
// 15.9. Vector Element Index Instruction
//===----------------------------------------------------------------------===//
+let isReMaterializable = 1 in
defm PseudoVID : VPseudoVID_V;
} // Predicates = [HasVInstructions]
diff --git a/llvm/test/CodeGen/RISCV/rvv/remat.ll b/llvm/test/CodeGen/RISCV/rvv/remat.ll
new file mode 100644
index 0000000000000..d7a8a13dd3664
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/remat.ll
@@ -0,0 +1,111 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=riscv64 -mattr=+v -verify-machineinstrs | FileCheck %s --check-prefixes=CHECK,POSTRA
+; RUN: llc < %s -mtriple=riscv64 -mattr=+v -riscv-vsetvl-after-rvv-regalloc=false -verify-machineinstrs | FileCheck %s --check-prefixes=CHECK,PRERA
+
+define void @vid(ptr %p) {
+; POSTRA-LABEL: vid:
+; POSTRA: # %bb.0:
+; POSTRA-NEXT: vsetvli a1, zero, e64, m8, ta, ma
+; POSTRA-NEXT: vid.v v8
+; POSTRA-NEXT: vs8r.v v8, (a0)
+; POSTRA-NEXT: vl8re64.v v16, (a0)
+; POSTRA-NEXT: vl8re64.v v24, (a0)
+; POSTRA-NEXT: vl8re64.v v0, (a0)
+; POSTRA-NEXT: vl8re64.v v8, (a0)
+; POSTRA-NEXT: vs8r.v v8, (a0)
+; POSTRA-NEXT: vs8r.v v0, (a0)
+; POSTRA-NEXT: vs8r.v v24, (a0)
+; POSTRA-NEXT: vs8r.v v16, (a0)
+; POSTRA-NEXT: vid.v v8
+; POSTRA-NEXT: vs8r.v v8, (a0)
+; POSTRA-NEXT: ret
+;
+; PRERA-LABEL: vid:
+; PRERA: # %bb.0:
+; PRERA-NEXT: addi sp, sp, -16
+; PRERA-NEXT: .cfi_def_cfa_offset 16
+; PRERA-NEXT: csrr a1, vlenb
+; PRERA-NEXT: slli a1, a1, 3
+; PRERA-NEXT: sub sp, sp, a1
+; PRERA-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 8 * vlenb
+; PRERA-NEXT: vsetvli a1, zero, e64, m8, ta, ma
+; PRERA-NEXT: vid.v v8
+; PRERA-NEXT: vs8r.v v8, (a0)
+; PRERA-NEXT: vl8re64.v v16, (a0)
+; PRERA-NEXT: addi a1, sp, 16
+; PRERA-NEXT: vs8r.v v16, (a1) # Unknown-size Folded Spill
+; PRERA-NEXT: vl8re64.v v24, (a0)
+; PRERA-NEXT: vl8re64.v v0, (a0)
+; PRERA-NEXT: vl8re64.v v16, (a0)
+; PRERA-NEXT: vs8r.v v16, (a0)
+; PRERA-NEXT: vs8r.v v0, (a0)
+; PRERA-NEXT: vs8r.v v24, (a0)
+; PRERA-NEXT: vl8r.v v16, (a1) # Unknown-size Folded Reload
+; PRERA-NEXT: vs8r.v v16, (a0)
+; PRERA-NEXT: vs8r.v v8, (a0)
+; PRERA-NEXT: csrr a0, vlenb
+; PRERA-NEXT: slli a0, a0, 3
+; PRERA-NEXT: add sp, sp, a0
+; PRERA-NEXT: addi sp, sp, 16
+; PRERA-NEXT: ret
+ %vid = call <vscale x 8 x i64> @llvm.riscv.vid.nxv8i64(<vscale x 8 x i64> poison, i64 -1)
+ store volatile <vscale x 8 x i64> %vid, ptr %p
+
+ %a = load volatile <vscale x 8 x i64>, ptr %p
+ %b = load volatile <vscale x 8 x i64>, ptr %p
+ %c = load volatile <vscale x 8 x i64>, ptr %p
+ %d = load volatile <vscale x 8 x i64>, ptr %p
+ store volatile <vscale x 8 x i64> %d, ptr %p
+ store volatile <vscale x 8 x i64> %c, ptr %p
+ store volatile <vscale x 8 x i64> %b, ptr %p
+ store volatile <vscale x 8 x i64> %a, ptr %p
+
+ store volatile <vscale x 8 x i64> %vid, ptr %p
+ ret void
+}
+
+
+define void @vid_passthru(ptr %p, <vscale x 8 x i64> %v) {
+; CHECK-LABEL: vid_passthru:
+; CHECK: # %bb.0:
+; CHECK-NEXT: addi sp, sp, -16
+; CHECK-NEXT: .cfi_def_cfa_offset 16
+; CHECK-NEXT: csrr a1, vlenb
+; CHECK-NEXT: slli a1, a1, 3
+; CHECK-NEXT: sub sp, sp, a1
+; CHECK-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 8 * vlenb
+; CHECK-NEXT: vsetivli zero, 1, e64, m8, tu, ma
+; CHECK-NEXT: vid.v v8
+; CHECK-NEXT: vs8r.v v8, (a0)
+; CHECK-NEXT: vl8re64.v v16, (a0)
+; CHECK-NEXT: addi a1, sp, 16
+; CHECK-NEXT: vs8r.v v16, (a1) # Unknown-size Folded Spill
+; CHECK-NEXT: vl8re64.v v24, (a0)
+; CHECK-NEXT: vl8re64.v v0, (a0)
+; CHECK-NEXT: vl8re64.v v16, (a0)
+; CHECK-NEXT: vs8r.v v16, (a0)
+; CHECK-NEXT: vs8r.v v0, (a0)
+; CHECK-NEXT: vs8r.v v24, (a0)
+; CHECK-NEXT: vl8r.v v16, (a1) # Unknown-size Folded Reload
+; CHECK-NEXT: vs8r.v v16, (a0)
+; CHECK-NEXT: vs8r.v v8, (a0)
+; CHECK-NEXT: csrr a0, vlenb
+; CHECK-NEXT: slli a0, a0, 3
+; CHECK-NEXT: add sp, sp, a0
+; CHECK-NEXT: addi sp, sp, 16
+; CHECK-NEXT: ret
+ %vid = call <vscale x 8 x i64> @llvm.riscv.vid.nxv8i64(<vscale x 8 x i64> %v, i64 1)
+ store volatile <vscale x 8 x i64> %vid, ptr %p
+
+ %a = load volatile <vscale x 8 x i64>, ptr %p
+ %b = load volatile <vscale x 8 x i64>, ptr %p
+ %c = load volatile <vscale x 8 x i64>, ptr %p
+ %d = load volatile <vscale x 8 x i64>, ptr %p
+ store volatile <vscale x 8 x i64> %d, ptr %p
+ store volatile <vscale x 8 x i64> %c, ptr %p
+ store volatile <vscale x 8 x i64> %b, ptr %p
+ store volatile <vscale x 8 x i64> %a, ptr %p
+
+ store volatile <vscale x 8 x i64> %vid, ptr %p
+ ret void
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -0,0 +1,111 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rematerializable instructions can be hoisted in MachineLICM, need we add some loop tests here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LICM should be able to hoist vid.v even it wasn't rematerializable. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right!
This adds initial support for rematerializing vector instructions, starting with vid.v since it's simple and has the least number of operands. It has one passthru operand which we need to check is undefined. It also has an AVL operand, but it's fine to rematerialize with it because it's scalar and register allocation is split between vector and scalar. RISCVInsertVSETVLI can still happen before vector regalloc if -riscv-vsetvl-after-rvv-regalloc is false, so this makes sure that we only rematerialize after regalloc by checking for the implicit uses that are added.
This continues the line of work started in llvm#97520, and gives a 2.5% reduction in the number of spills on SPEC CPU 2017. Program regalloc.NumSpills regalloc.NumReloads regalloc.NumReMaterialization lhs rhs diff lhs rhs diff lhs rhs diff 605.mcf_s 141.00 141.00 0.0% 372.00 372.00 0.0% 123.00 123.00 0.0% 505.mcf_r 141.00 141.00 0.0% 372.00 372.00 0.0% 123.00 123.00 0.0% 519.lbm_r 73.00 73.00 0.0% 75.00 75.00 0.0% 18.00 18.00 0.0% 619.lbm_s 68.00 68.00 0.0% 70.00 70.00 0.0% 20.00 20.00 0.0% 631.deepsjeng_s 354.00 353.00 -0.3% 683.00 682.00 -0.1% 529.00 530.00 0.2% 531.deepsjeng_r 354.00 353.00 -0.3% 683.00 682.00 -0.1% 529.00 530.00 0.2% 625.x264_s 1896.00 1886.00 -0.5% 4583.00 4561.00 -0.5% 2086.00 2108.00 1.1% 525.x264_r 1896.00 1886.00 -0.5% 4583.00 4561.00 -0.5% 2086.00 2108.00 1.1% 508.namd_r 6665.00 6598.00 -1.0% 15649.00 15509.00 -0.9% 3014.00 3164.00 5.0% 644.nab_s 761.00 753.00 -1.1% 1199.00 1183.00 -1.3% 1542.00 1559.00 1.1% 544.nab_r 761.00 753.00 -1.1% 1199.00 1183.00 -1.3% 1542.00 1559.00 1.1% 638.imagick_s 4287.00 4181.00 -2.5% 11624.00 11342.00 -2.4% 10551.00 10884.00 3.2% 538.imagick_r 4287.00 4181.00 -2.5% 11624.00 11342.00 -2.4% 10551.00 10884.00 3.2% 602.gcc_s 12771.00 12450.00 -2.5% 28117.00 27328.00 -2.8% 49757.00 50526.00 1.5% 502.gcc_r 12771.00 12450.00 -2.5% 28117.00 27328.00 -2.8% 49757.00 50526.00 1.5% Geomean difference -2.5% -2.6% 1.8% I initially held off submitting this patch because it surprisingly introduced a lot of spills in the test diffs, but after llvm#107290 the vmv.v.is that caused them are now gone. The gist is that marking vmv.v.i as spillable decreased its spill weight, which actually resulted in more m8 registers getting evicted and spilled during register allocation. The SPEC results show this isn't an issue in practice though, and I plan on posting a separate patch to explain this in more detail.
This continues the line of work started in #97520, and gives a 2.5% reduction in the number of spills on SPEC CPU 2017. Program regalloc.NumSpills lhs rhs diff 605.mcf_s 141.00 141.00 0.0% 505.mcf_r 141.00 141.00 0.0% 519.lbm_r 73.00 73.00 0.0% 619.lbm_s 68.00 68.00 0.0% 631.deepsjeng_s 354.00 353.00 -0.3% 531.deepsjeng_r 354.00 353.00 -0.3% 625.x264_s 1896.00 1886.00 -0.5% 525.x264_r 1896.00 1886.00 -0.5% 508.namd_r 6665.00 6598.00 -1.0% 644.nab_s 761.00 753.00 -1.1% 544.nab_r 761.00 753.00 -1.1% 638.imagick_s 4287.00 4181.00 -2.5% 538.imagick_r 4287.00 4181.00 -2.5% 602.gcc_s 12771.00 12450.00 -2.5% 502.gcc_r 12771.00 12450.00 -2.5% 510.parest_r 43876.00 42740.00 -2.6% 500.perlbench_r 4297.00 4179.00 -2.7% 600.perlbench_s 4297.00 4179.00 -2.7% 526.blender_r 13503.00 13103.00 -3.0% 511.povray_r 2006.00 1937.00 -3.4% 620.omnetpp_s 984.00 946.00 -3.9% 520.omnetpp_r 984.00 946.00 -3.9% 657.xz_s 302.00 289.00 -4.3% 557.xz_r 302.00 289.00 -4.3% 541.leela_r 378.00 356.00 -5.8% 641.leela_s 378.00 356.00 -5.8% 623.xalancbmk_s 1646.00 1548.00 -6.0% 523.xalancbmk_r 1646.00 1548.00 -6.0% Geomean difference -2.5% I initially held off submitting this patch because it surprisingly introduced a lot of spills in the test diffs, but after #107290 the vmv.v.is that caused them are now gone. The gist is that marking vmv.v.i as spillable decreased its spill weight, which actually resulted in more m8 registers getting evicted and spilled during register allocation. The SPEC results show this isn't an issue in practice though, and I plan on posting a separate patch to explain this in more detail.
This adds initial support for rematerializing vector instructions, starting with vid.v since it's simple and has the least number of operands. It has one passthru operand which we need to check is undefined. It also has an AVL operand, but it's fine to rematerialize with it because it's scalar and register allocation is split between vector and scalar.
RISCVInsertVSETVLI can still happen before vector regalloc if -riscv-vsetvl-after-rvv-regalloc is false, so this makes sure that we only rematerialize after regalloc by checking for the implicit uses that are added.