Skip to content

[RISCV] Rematerialize vmv.v.i #107550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 9, 2024
Merged

[RISCV] Rematerialize vmv.v.i #107550

merged 2 commits into from
Sep 9, 2024

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Sep 6, 2024

This continues the line of work started in #97520, and gives a 2.5% reduction in the number of spills on SPEC CPU 2017.

Program            regalloc.NumSpills                regalloc.NumReloads                regalloc.NumReMaterialization               
                   lhs                rhs      diff  lhs                 rhs      diff  lhs                           rhs      diff 
         605.mcf_s   141.00             141.00  0.0%   372.00              372.00  0.0%   123.00                        123.00  0.0%
         505.mcf_r   141.00             141.00  0.0%   372.00              372.00  0.0%   123.00                        123.00  0.0%
         519.lbm_r    73.00              73.00  0.0%    75.00               75.00  0.0%    18.00                         18.00  0.0%
         619.lbm_s    68.00              68.00  0.0%    70.00               70.00  0.0%    20.00                         20.00  0.0%
   631.deepsjeng_s   354.00             353.00 -0.3%   683.00              682.00 -0.1%   529.00                        530.00  0.2%
   531.deepsjeng_r   354.00             353.00 -0.3%   683.00              682.00 -0.1%   529.00                        530.00  0.2%
        625.x264_s  1896.00            1886.00 -0.5%  4583.00             4561.00 -0.5%  2086.00                       2108.00  1.1%
        525.x264_r  1896.00            1886.00 -0.5%  4583.00             4561.00 -0.5%  2086.00                       2108.00  1.1%
        508.namd_r  6665.00            6598.00 -1.0% 15649.00            15509.00 -0.9%  3014.00                       3164.00  5.0%
         644.nab_s   761.00             753.00 -1.1%  1199.00             1183.00 -1.3%  1542.00                       1559.00  1.1%
         544.nab_r   761.00             753.00 -1.1%  1199.00             1183.00 -1.3%  1542.00                       1559.00  1.1%
     638.imagick_s  4287.00            4181.00 -2.5% 11624.00            11342.00 -2.4% 10551.00                      10884.00  3.2%
     538.imagick_r  4287.00            4181.00 -2.5% 11624.00            11342.00 -2.4% 10551.00                      10884.00  3.2%
         602.gcc_s 12771.00           12450.00 -2.5% 28117.00            27328.00 -2.8% 49757.00                      50526.00  1.5%
         502.gcc_r 12771.00           12450.00 -2.5% 28117.00            27328.00 -2.8% 49757.00                      50526.00  1.5%
      510.parest_r 43876.00           42740.00 -2.6% 85108.00            82400.00 -3.2% 62459.00                      65165.00  4.3%
   500.perlbench_r  4297.00            4179.00 -2.7%  9472.00             9165.00 -3.2%  9962.00                      10222.00  2.6%
   600.perlbench_s  4297.00            4179.00 -2.7%  9472.00             9165.00 -3.2%  9962.00                      10222.00  2.6%
     526.blender_r 13503.00           13103.00 -3.0% 27249.00            26459.00 -2.9% 64366.00                      65188.00  1.3%
      511.povray_r  2006.00            1937.00 -3.4%  3760.00             3629.00 -3.5%  4780.00                       4914.00  2.8%
     620.omnetpp_s   984.00             946.00 -3.9%  1556.00             1485.00 -4.6%  8342.00                       8413.00  0.9%
     520.omnetpp_r   984.00             946.00 -3.9%  1556.00             1485.00 -4.6%  8342.00                       8413.00  0.9%
          657.xz_s   302.00             289.00 -4.3%   521.00              505.00 -3.1%   597.00                        613.00  2.7%
          557.xz_r   302.00             289.00 -4.3%   521.00              505.00 -3.1%   597.00                        613.00  2.7%
       541.leela_r   378.00             356.00 -5.8%   561.00              525.00 -6.4%   765.00                        801.00  4.7%
       641.leela_s   378.00             356.00 -5.8%   561.00              525.00 -6.4%   765.00                        801.00  4.7%
   623.xalancbmk_s  1646.00            1548.00 -6.0%  2640.00             2466.00 -6.6% 13827.00                      13983.00  1.1%
   523.xalancbmk_r  1646.00            1548.00 -6.0%  2640.00             2466.00 -6.6% 13827.00                      13983.00  1.1%
Geomean difference                             -2.5%                              -2.6%                                         1.8%

I initially held off submitting this patch because it surprisingly introduced a lot of spills in the test diffs, but after #107290 the vmv.v.is that caused them are now gone.

The gist is that marking vmv.v.i as spillable decreased its spill weight, which actually resulted in more m8 registers getting evicted and spilled during register allocation.

The SPEC results show this isn't an issue in practice though, and I plan on posting a separate patch to explain this in more detail.

This continues the line of work started in llvm#97520, and gives a 2.5% reduction in the number of spills on SPEC CPU 2017.

    Program            regalloc.NumSpills                regalloc.NumReloads                regalloc.NumReMaterialization
                       lhs                rhs      diff  lhs                 rhs      diff  lhs                           rhs      diff
             605.mcf_s   141.00             141.00  0.0%   372.00              372.00  0.0%   123.00                        123.00  0.0%
             505.mcf_r   141.00             141.00  0.0%   372.00              372.00  0.0%   123.00                        123.00  0.0%
             519.lbm_r    73.00              73.00  0.0%    75.00               75.00  0.0%    18.00                         18.00  0.0%
             619.lbm_s    68.00              68.00  0.0%    70.00               70.00  0.0%    20.00                         20.00  0.0%
       631.deepsjeng_s   354.00             353.00 -0.3%   683.00              682.00 -0.1%   529.00                        530.00  0.2%
       531.deepsjeng_r   354.00             353.00 -0.3%   683.00              682.00 -0.1%   529.00                        530.00  0.2%
            625.x264_s  1896.00            1886.00 -0.5%  4583.00             4561.00 -0.5%  2086.00                       2108.00  1.1%
            525.x264_r  1896.00            1886.00 -0.5%  4583.00             4561.00 -0.5%  2086.00                       2108.00  1.1%
            508.namd_r  6665.00            6598.00 -1.0% 15649.00            15509.00 -0.9%  3014.00                       3164.00  5.0%
             644.nab_s   761.00             753.00 -1.1%  1199.00             1183.00 -1.3%  1542.00                       1559.00  1.1%
             544.nab_r   761.00             753.00 -1.1%  1199.00             1183.00 -1.3%  1542.00                       1559.00  1.1%
         638.imagick_s  4287.00            4181.00 -2.5% 11624.00            11342.00 -2.4% 10551.00                      10884.00  3.2%
         538.imagick_r  4287.00            4181.00 -2.5% 11624.00            11342.00 -2.4% 10551.00                      10884.00  3.2%
             602.gcc_s 12771.00           12450.00 -2.5% 28117.00            27328.00 -2.8% 49757.00                      50526.00  1.5%
             502.gcc_r 12771.00           12450.00 -2.5% 28117.00            27328.00 -2.8% 49757.00                      50526.00  1.5%
    Geomean difference                             -2.5%                              -2.6%                                         1.8%

I initially held off submitting this patch because it surprisingly introduced a lot of spills in the test diffs, but after llvm#107290 the vmv.v.is that caused them are now gone.

The gist is that marking vmv.v.i as spillable decreased its spill weight, which actually resulted in more m8 registers getting evicted and spilled during register allocation.

The SPEC results show this isn't an issue in practice though, and I plan on posting a separate patch to explain this in more detail.
@llvmbot
Copy link
Member

llvmbot commented Sep 6, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

This continues the line of work started in #97520, and gives a 2.5% reduction in the number of spills on SPEC CPU 2017.

Program            regalloc.NumSpills                regalloc.NumReloads                regalloc.NumReMaterialization
                   lhs                rhs      diff  lhs                 rhs      diff  lhs                           rhs      diff
         605.mcf_s   141.00             141.00  0.0%   372.00              372.00  0.0%   123.00                        123.00  0.0%
         505.mcf_r   141.00             141.00  0.0%   372.00              372.00  0.0%   123.00                        123.00  0.0%
         519.lbm_r    73.00              73.00  0.0%    75.00               75.00  0.0%    18.00                         18.00  0.0%
         619.lbm_s    68.00              68.00  0.0%    70.00               70.00  0.0%    20.00                         20.00  0.0%
   631.deepsjeng_s   354.00             353.00 -0.3%   683.00              682.00 -0.1%   529.00                        530.00  0.2%
   531.deepsjeng_r   354.00             353.00 -0.3%   683.00              682.00 -0.1%   529.00                        530.00  0.2%
        625.x264_s  1896.00            1886.00 -0.5%  4583.00             4561.00 -0.5%  2086.00                       2108.00  1.1%
        525.x264_r  1896.00            1886.00 -0.5%  4583.00             4561.00 -0.5%  2086.00                       2108.00  1.1%
        508.namd_r  6665.00            6598.00 -1.0% 15649.00            15509.00 -0.9%  3014.00                       3164.00  5.0%
         644.nab_s   761.00             753.00 -1.1%  1199.00             1183.00 -1.3%  1542.00                       1559.00  1.1%
         544.nab_r   761.00             753.00 -1.1%  1199.00             1183.00 -1.3%  1542.00                       1559.00  1.1%
     638.imagick_s  4287.00            4181.00 -2.5% 11624.00            11342.00 -2.4% 10551.00                      10884.00  3.2%
     538.imagick_r  4287.00            4181.00 -2.5% 11624.00            11342.00 -2.4% 10551.00                      10884.00  3.2%
         602.gcc_s 12771.00           12450.00 -2.5% 28117.00            27328.00 -2.8% 49757.00                      50526.00  1.5%
         502.gcc_r 12771.00           12450.00 -2.5% 28117.00            27328.00 -2.8% 49757.00                      50526.00  1.5%
Geomean difference                             -2.5%                              -2.6%                                         1.8%

I initially held off submitting this patch because it surprisingly introduced a lot of spills in the test diffs, but after #107290 the vmv.v.is that caused them are now gone.

The gist is that marking vmv.v.i as spillable decreased its spill weight, which actually resulted in more m8 registers getting evicted and spilled during register allocation.

The SPEC results show this isn't an issue in practice though, and I plan on posting a separate patch to explain this in more detail.


Full diff: https://github.com/llvm/llvm-project/pull/107550.diff

4 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+13-7)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td (+1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/remat.ll (+62)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vselect-fp.ll (+1)
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 0a64a8e1440084..325a50c9f48a1c 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -168,13 +168,19 @@ Register RISCVInstrInfo::isStoreToStackSlot(const MachineInstr &MI,
 
 bool RISCVInstrInfo::isReallyTriviallyReMaterializable(
     const MachineInstr &MI) const {
-  if (RISCV::getRVVMCOpcode(MI.getOpcode()) == RISCV::VID_V &&
-      MI.getOperand(1).isUndef() &&
-      /* After RISCVInsertVSETVLI most pseudos will have implicit uses on vl and
-         vtype.  Make sure we only rematerialize before RISCVInsertVSETVLI
-         i.e. -riscv-vsetvl-after-rvv-regalloc=true */
-      !MI.hasRegisterImplicitUseOperand(RISCV::VTYPE))
-    return true;
+  switch (RISCV::getRVVMCOpcode(MI.getOpcode())) {
+  case RISCV::VMV_V_I:
+  case RISCV::VID_V:
+    if (MI.getOperand(1).isUndef() &&
+        /* After RISCVInsertVSETVLI most pseudos will have implicit uses on vl
+           and vtype.  Make sure we only rematerialize before RISCVInsertVSETVLI
+           i.e. -riscv-vsetvl-after-rvv-regalloc=true */
+        !MI.hasRegisterImplicitUseOperand(RISCV::VTYPE))
+      return true;
+    break;
+  default:
+    break;
+  }
   return TargetInstrInfo::isReallyTriviallyReMaterializable(MI);
 }
 
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
index c91c9c3614a34c..e11f176bfe6041 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
@@ -2478,6 +2478,7 @@ multiclass VPseudoUnaryVMV_V_X_I {
         def "_X_" # mx : VPseudoUnaryNoMask<m.vrclass, GPR>,
                          SchedUnary<"WriteVIMovX", "ReadVIMovX", mx,
                                     forcePassthruRead=true>;
+        let isReMaterializable = 1 in
         def "_I_" # mx : VPseudoUnaryNoMask<m.vrclass, simm5>,
                          SchedNullary<"WriteVIMovI", mx,
                                       forcePassthruRead=true>;
diff --git a/llvm/test/CodeGen/RISCV/rvv/remat.ll b/llvm/test/CodeGen/RISCV/rvv/remat.ll
index d7a8a13dd36643..2b12249378eb1f 100644
--- a/llvm/test/CodeGen/RISCV/rvv/remat.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/remat.ll
@@ -109,3 +109,65 @@ define void @vid_passthru(ptr %p, <vscale x 8 x i64> %v) {
   store volatile <vscale x 8 x i64> %vid, ptr %p
   ret void
 }
+
+define void @vmv.v.i(ptr %p) {
+; POSTRA-LABEL: vmv.v.i:
+; POSTRA:       # %bb.0:
+; POSTRA-NEXT:    vsetvli a1, zero, e64, m8, ta, ma
+; POSTRA-NEXT:    vmv.v.i v8, 1
+; POSTRA-NEXT:    vs8r.v v8, (a0)
+; POSTRA-NEXT:    vl8re64.v v16, (a0)
+; POSTRA-NEXT:    vl8re64.v v24, (a0)
+; POSTRA-NEXT:    vl8re64.v v0, (a0)
+; POSTRA-NEXT:    vl8re64.v v8, (a0)
+; POSTRA-NEXT:    vs8r.v v8, (a0)
+; POSTRA-NEXT:    vs8r.v v0, (a0)
+; POSTRA-NEXT:    vs8r.v v24, (a0)
+; POSTRA-NEXT:    vs8r.v v16, (a0)
+; POSTRA-NEXT:    vmv.v.i v8, 1
+; POSTRA-NEXT:    vs8r.v v8, (a0)
+; POSTRA-NEXT:    ret
+;
+; PRERA-LABEL: vmv.v.i:
+; PRERA:       # %bb.0:
+; PRERA-NEXT:    addi sp, sp, -16
+; PRERA-NEXT:    .cfi_def_cfa_offset 16
+; PRERA-NEXT:    csrr a1, vlenb
+; PRERA-NEXT:    slli a1, a1, 3
+; PRERA-NEXT:    sub sp, sp, a1
+; PRERA-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 8 * vlenb
+; PRERA-NEXT:    vsetvli a1, zero, e64, m8, ta, ma
+; PRERA-NEXT:    vmv.v.i v8, 1
+; PRERA-NEXT:    vs8r.v v8, (a0)
+; PRERA-NEXT:    vl8re64.v v16, (a0)
+; PRERA-NEXT:    addi a1, sp, 16
+; PRERA-NEXT:    vs8r.v v16, (a1) # Unknown-size Folded Spill
+; PRERA-NEXT:    vl8re64.v v24, (a0)
+; PRERA-NEXT:    vl8re64.v v0, (a0)
+; PRERA-NEXT:    vl8re64.v v16, (a0)
+; PRERA-NEXT:    vs8r.v v16, (a0)
+; PRERA-NEXT:    vs8r.v v0, (a0)
+; PRERA-NEXT:    vs8r.v v24, (a0)
+; PRERA-NEXT:    vl8r.v v16, (a1) # Unknown-size Folded Reload
+; PRERA-NEXT:    vs8r.v v16, (a0)
+; PRERA-NEXT:    vs8r.v v8, (a0)
+; PRERA-NEXT:    csrr a0, vlenb
+; PRERA-NEXT:    slli a0, a0, 3
+; PRERA-NEXT:    add sp, sp, a0
+; PRERA-NEXT:    addi sp, sp, 16
+; PRERA-NEXT:    ret
+  %vmv.v.i = call <vscale x 8 x i64> @llvm.riscv.vmv.v.x.nxv8i64(<vscale x 8 x i64> poison, i64 1, i64 -1)
+  store volatile <vscale x 8 x i64> %vmv.v.i, ptr %p
+
+  %a = load volatile <vscale x 8 x i64>, ptr %p
+  %b = load volatile <vscale x 8 x i64>, ptr %p
+  %c = load volatile <vscale x 8 x i64>, ptr %p
+  %d = load volatile <vscale x 8 x i64>, ptr %p
+  store volatile <vscale x 8 x i64> %d, ptr %p
+  store volatile <vscale x 8 x i64> %c, ptr %p
+  store volatile <vscale x 8 x i64> %b, ptr %p
+  store volatile <vscale x 8 x i64> %a, ptr %p
+
+  store volatile <vscale x 8 x i64> %vmv.v.i, ptr %p
+  ret void
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/vselect-fp.ll b/llvm/test/CodeGen/RISCV/rvv/vselect-fp.ll
index b1980fcf420a82..2a3a3a3daae4c4 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vselect-fp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vselect-fp.ll
@@ -519,6 +519,7 @@ define void @vselect_legalize_regression(<vscale x 16 x double> %a, <vscale x 16
 ; CHECK-NEXT:    vmv.v.i v24, 0
 ; CHECK-NEXT:    vmerge.vvm v16, v24, v16, v0
 ; CHECK-NEXT:    vmv1r.v v0, v7
+; CHECK-NEXT:    vmv.v.i v24, 0
 ; CHECK-NEXT:    vmerge.vvm v8, v24, v8, v0
 ; CHECK-NEXT:    vs8r.v v8, (a1)
 ; CHECK-NEXT:    slli a0, a0, 3

@@ -519,6 +519,7 @@ define void @vselect_legalize_regression(<vscale x 16 x double> %a, <vscale x 16
; CHECK-NEXT: vmv.v.i v24, 0
; CHECK-NEXT: vmerge.vvm v16, v24, v16, v0
; CHECK-NEXT: vmv1r.v v0, v7
; CHECK-NEXT: vmv.v.i v24, 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of an m8 register now being evicted due to the change in spill weights, even when nothing here actually needs to be spilled/remateralized.

Copy link
Contributor

@BeMg BeMg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Impressive results on SPEC 2017!

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -519,6 +519,7 @@ define void @vselect_legalize_regression(<vscale x 16 x double> %a, <vscale x 16
; CHECK-NEXT: vmv.v.i v24, 0
; CHECK-NEXT: vmerge.vvm v16, v24, v16, v0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea why we're not turning this into a vmerge.vim with a negated mask? That would radically reduce register pressure and work, and be net equal in terms of instructions.

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lukel97 lukel97 merged commit 65dc53b into llvm:main Sep 9, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants