Skip to content

Reland "[RISCV] Add scheduling model for mips p8700 CPU" #120550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 19, 2024

Conversation

djtodoro
Copy link
Collaborator

@djtodoro djtodoro commented Dec 19, 2024

This patch introduces a scheduling model for the MIPS p8700, an out-of-order
RISC-V processor. The model includes pipelines for the following units:

  • 2 Integer Arithmetic/Logical Units (ALU and AL2)
  • Multiply/Divide Unit (MDU)
  • Branch Unit (CTI)
  • Load/Store Unit (LSU)
  • Short Floating-Point Pipe (FPUS)
  • Long Floating-Point Pipe (FPUL)

For additional details, refer to the official product page:
https://mips.com/products/hardware/p8700/.

Also adds UnsupportedSchedZfhmin to handle cases like WriteFCvtF16ToF32 that
previously caused build failures.

@llvmbot
Copy link
Member

llvmbot commented Dec 19, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Djordje Todorovic (djtodoro)

Changes

Add UnsupportedSchedZfhmin.


Full diff: https://github.com/llvm/llvm-project/pull/120550.diff

4 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCV.td (+1)
  • (modified) llvm/lib/Target/RISCV/RISCVProcessors.td (+1-2)
  • (added) llvm/lib/Target/RISCV/RISCVSchedMIPSP8700.td (+281)
  • (added) llvm/test/tools/llvm-mca/RISCV/MIPS/p8700.s (+143)
diff --git a/llvm/lib/Target/RISCV/RISCV.td b/llvm/lib/Target/RISCV/RISCV.td
index 00c3d702e12a22..1df6f9ae1944c8 100644
--- a/llvm/lib/Target/RISCV/RISCV.td
+++ b/llvm/lib/Target/RISCV/RISCV.td
@@ -46,6 +46,7 @@ include "RISCVMacroFusion.td"
 // RISC-V Scheduling Models
 //===----------------------------------------------------------------------===//
 
+include "RISCVSchedMIPSP8700.td"
 include "RISCVSchedRocket.td"
 include "RISCVSchedSiFive7.td"
 include "RISCVSchedSiFiveP400.td"
diff --git a/llvm/lib/Target/RISCV/RISCVProcessors.td b/llvm/lib/Target/RISCV/RISCVProcessors.td
index 445e084d07686b..053a3b11f39bc5 100644
--- a/llvm/lib/Target/RISCV/RISCVProcessors.td
+++ b/llvm/lib/Target/RISCV/RISCVProcessors.td
@@ -105,7 +105,7 @@ def GENERIC_RV64 : RISCVProcessorModel<"generic-rv64",
 def GENERIC : RISCVTuneProcessorModel<"generic", NoSchedModel>, GenericTuneInfo;
 
 def MIPS_P8700 : RISCVProcessorModel<"mips-p8700",
-                                     NoSchedModel,
+                                     MIPSP8700Model,
                                      [Feature64Bit,
                                       FeatureStdExtI,
                                       FeatureStdExtM,
@@ -321,7 +321,6 @@ def SIFIVE_P470 : RISCVProcessorModel<"sifive-p470", SiFiveP400Model,
                                                   [TuneNoSinkSplatOperands,
                                                    TuneVXRMPipelineFlush])>;
 
-
 def SIFIVE_P670 : RISCVProcessorModel<"sifive-p670", SiFiveP600Model,
                                       !listconcat(RVA22U64Features,
                                       [FeatureStdExtV,
diff --git a/llvm/lib/Target/RISCV/RISCVSchedMIPSP8700.td b/llvm/lib/Target/RISCV/RISCVSchedMIPSP8700.td
new file mode 100644
index 00000000000000..550f83a59b8b0e
--- /dev/null
+++ b/llvm/lib/Target/RISCV/RISCVSchedMIPSP8700.td
@@ -0,0 +1,281 @@
+//===-- RISCVSchedMIPSP8700.td - MIPS RISC-V Processor -----*- tablegen -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+//===----------------------------------------------------------------------===//
+// P8700 - a RISC-V processor by MIPS.
+// Pipelines:
+//   - 2 Integer Arithmetic and Logical Units (ALU and AL2)
+//   - Multiply / Divide Unit (MDU)
+//   - Branch Unit (CTI)
+//   - Load Store Unit (LSU)
+//   - Short Floating Point Pipe (FPUS)
+//   - Long Floating Point Pipe (FPUL)
+//===----------------------------------------------------------------------===//
+
+def MIPSP8700Model : SchedMachineModel {
+  int IssueWidth = 4;
+  int MicroOpBufferSize = 96;
+  int LoadLatency = 4;
+  int MispredictPenalty = 8;
+  let CompleteModel = 0;
+}
+
+let SchedModel = MIPSP8700Model in {
+// Handle ALQ Pipelines.
+// It contains 1 ALU Unit only.
+def p8700ALQ : ProcResource<1> { let BufferSize = 16; }
+
+// Handle AGQ Pipelines.
+def p8700AGQ : ProcResource<3> { let BufferSize = 16; }
+def p8700IssueAL2 : ProcResource<1> { let Super = p8700AGQ; }
+def p8700IssueCTI : ProcResource<1> { let Super = p8700AGQ; }
+def p8700IssueLSU : ProcResource<1> { let Super = p8700AGQ; }
+def p8700WriteEitherALU : ProcResGroup<[p8700ALQ, p8700IssueAL2]>;
+
+// Handle Multiply Divide Pipe.
+def p8700GpDiv : ProcResource<1>;
+def p8700GpMul : ProcResource<1>;
+
+def : WriteRes<WriteIALU, [p8700WriteEitherALU]>;
+def : WriteRes<WriteIALU32, [p8700WriteEitherALU]>;
+def : WriteRes<WriteShiftImm, [p8700WriteEitherALU]>;
+def : WriteRes<WriteShiftImm32, [p8700WriteEitherALU]>;
+def : WriteRes<WriteShiftReg, [p8700WriteEitherALU]>;
+def : WriteRes<WriteShiftReg32, [p8700WriteEitherALU]>;
+
+// Handle zba.
+def : WriteRes<WriteSHXADD, [p8700WriteEitherALU]>;
+def : WriteRes<WriteSHXADD32, [p8700WriteEitherALU]>;
+
+// Handle zbb.
+let Latency = 2 in {
+def : WriteRes<WriteCLZ, [p8700IssueAL2]>;
+def : WriteRes<WriteCTZ, [p8700IssueAL2]>;
+def : WriteRes<WriteCPOP, [p8700IssueAL2]>;
+def : WriteRes<WriteCLZ32, [p8700IssueAL2]>;
+def : WriteRes<WriteCTZ32, [p8700IssueAL2]>;
+def : WriteRes<WriteCPOP32, [p8700IssueAL2]>;
+}
+def : WriteRes<WriteRotateReg, [p8700WriteEitherALU]>;
+def : WriteRes<WriteRotateImm, [p8700WriteEitherALU]>;
+def : WriteRes<WriteRotateReg32, [p8700WriteEitherALU]>;
+def : WriteRes<WriteRotateImm32, [p8700WriteEitherALU]>;
+def : WriteRes<WriteREV8, [p8700WriteEitherALU]>;
+def : WriteRes<WriteORCB, [p8700WriteEitherALU]>;
+def : WriteRes<WriteIMinMax, [p8700WriteEitherALU]>;
+
+let Latency = 0 in
+def : WriteRes<WriteNop, [p8700WriteEitherALU]>;
+
+let Latency = 4 in {
+def : WriteRes<WriteLDB, [p8700IssueLSU]>;
+def : WriteRes<WriteLDH, [p8700IssueLSU]>;
+def : WriteRes<WriteLDW, [p8700IssueLSU]>;
+def : WriteRes<WriteLDD, [p8700IssueLSU]>;
+
+def : WriteRes<WriteAtomicW, [p8700IssueLSU]>;
+def : WriteRes<WriteAtomicD, [p8700IssueLSU]>;
+def : WriteRes<WriteAtomicLDW, [p8700IssueLSU]>;
+def : WriteRes<WriteAtomicLDD, [p8700IssueLSU]>;
+}
+
+let Latency = 8 in {
+def : WriteRes<WriteFLD32, [p8700IssueLSU]>;
+def : WriteRes<WriteFLD64, [p8700IssueLSU]>;
+}
+
+let Latency = 3 in {
+def : WriteRes<WriteSTB, [p8700IssueLSU]>;
+def : WriteRes<WriteSTH, [p8700IssueLSU]>;
+def : WriteRes<WriteSTW, [p8700IssueLSU]>;
+def : WriteRes<WriteSTD, [p8700IssueLSU]>;
+
+def : WriteRes<WriteAtomicSTW, [p8700IssueLSU]>;
+def : WriteRes<WriteAtomicSTD, [p8700IssueLSU]>;
+}
+
+def : WriteRes<WriteFST32, [p8700IssueLSU]>;
+def : WriteRes<WriteFST64, [p8700IssueLSU]>;
+
+let Latency = 7 in {
+def : WriteRes<WriteFMovI32ToF32, [p8700IssueLSU]>;
+def : WriteRes<WriteFMovF32ToI32, [p8700IssueLSU]>;
+def : WriteRes<WriteFMovI64ToF64, [p8700IssueLSU]>;
+def : WriteRes<WriteFMovF64ToI64, [p8700IssueLSU]>;
+}
+
+let Latency = 4 in {
+def : WriteRes<WriteIMul, [p8700GpMul]>;
+def : WriteRes<WriteIMul32, [p8700GpMul]>;
+}
+
+let Latency = 7, ReleaseAtCycles = [7] in {
+def : WriteRes<WriteIDiv, [p8700GpDiv]>;
+def : WriteRes<WriteIDiv32,  [p8700GpDiv]>;
+def : WriteRes<WriteIRem, [p8700GpDiv]>;
+def : WriteRes<WriteIRem32, [p8700GpDiv]>;
+}
+
+def : WriteRes<WriteCSR, [p8700ALQ]>;
+
+// Handle CTI Pipeline.
+def : WriteRes<WriteJmp, [p8700IssueCTI]>;
+def : WriteRes<WriteJalr, [p8700IssueCTI]>;
+let Latency = 2 in {
+def : WriteRes<WriteJal, [p8700IssueCTI]>;
+def : WriteRes<WriteJalr, [p8700IssueCTI]>;
+}
+
+// Handle FPU Pipelines.
+def p8700FPQ : ProcResource<3> { let BufferSize = 16; }
+def p8700IssueFPUS : ProcResource<1> { let Super = p8700FPQ; }
+def p8700IssueFPUL : ProcResource<1> { let Super = p8700FPQ; }
+def p8700FpuApu    : ProcResource<1>;
+def p8700FpuLong   : ProcResource<1>;
+
+let Latency = 4 in {
+def : WriteRes<WriteFCvtI32ToF32, [p8700IssueFPUL, p8700FpuApu]>;
+def : WriteRes<WriteFCvtI32ToF64, [p8700IssueFPUL, p8700FpuApu]>;
+def : WriteRes<WriteFCvtI64ToF32, [p8700IssueFPUL, p8700FpuApu]>;
+def : WriteRes<WriteFCvtI64ToF64, [p8700IssueFPUL, p8700FpuApu]>;
+def : WriteRes<WriteFCvtF32ToI32, [p8700IssueFPUL, p8700FpuApu]>;
+def : WriteRes<WriteFCvtF32ToI64, [p8700IssueFPUL, p8700FpuApu]>;
+def : WriteRes<WriteFCvtF32ToF64, [p8700IssueFPUL, p8700FpuApu]>;
+def : WriteRes<WriteFCvtF64ToI32, [p8700IssueFPUL, p8700FpuApu]>;
+def : WriteRes<WriteFCvtF64ToI64, [p8700IssueFPUL, p8700FpuApu]>;
+def : WriteRes<WriteFCvtF64ToF32, [p8700IssueFPUL, p8700FpuApu]>;
+
+def : WriteRes<WriteFAdd32, [p8700IssueFPUL, p8700FpuApu]>;
+def : WriteRes<WriteFAdd64, [p8700IssueFPUL, p8700FpuApu]>;
+}
+
+let Latency = 2 in {
+def : WriteRes<WriteFSGNJ32, [p8700IssueFPUS, p8700FpuApu]>;
+def : WriteRes<WriteFMinMax32, [p8700IssueFPUS, p8700FpuApu]>;
+def : WriteRes<WriteFSGNJ64, [p8700IssueFPUS, p8700FpuApu]>;
+def : WriteRes<WriteFMinMax64, [p8700IssueFPUS, p8700FpuApu]>;
+
+def : WriteRes<WriteFCmp32, [p8700IssueFPUS, p8700FpuApu]>;
+def : WriteRes<WriteFCmp64, [p8700IssueFPUS, p8700FpuApu]>;
+}
+
+def : WriteRes<WriteFClass32, [p8700IssueFPUS, p8700FpuApu]>;
+def : WriteRes<WriteFClass64, [p8700IssueFPUS, p8700FpuApu]>;
+
+let Latency = 8 in {
+def : WriteRes<WriteFMA32, [p8700FpuLong, p8700FpuApu]>;
+def : WriteRes<WriteFMA64, [p8700FpuLong, p8700FpuApu]>;
+}
+
+let Latency = 5 in {
+def : WriteRes<WriteFMul32, [p8700FpuLong, p8700FpuApu]>;
+def : WriteRes<WriteFMul64, [p8700FpuLong, p8700FpuApu]>;
+}
+
+let Latency = 11, ReleaseAtCycles = [1, 11] in {
+def : WriteRes<WriteFDiv32, [p8700FpuLong, p8700FpuApu]>;
+def : WriteRes<WriteFSqrt32, [p8700FpuLong, p8700FpuApu]>;
+}
+
+let Latency = 17, ReleaseAtCycles = [1, 17] in {
+def : WriteRes<WriteFDiv64, [p8700IssueFPUL, p8700FpuApu]>;
+def : WriteRes<WriteFSqrt64, [p8700IssueFPUL, p8700FpuApu]>;
+}
+
+// Bypass and advance.
+def : ReadAdvance<ReadIALU, 0>;
+def : ReadAdvance<ReadIALU32, 0>;
+def : ReadAdvance<ReadShiftImm, 0>;
+def : ReadAdvance<ReadShiftImm32, 0>;
+def : ReadAdvance<ReadShiftReg, 0>;
+def : ReadAdvance<ReadShiftReg32, 0>;
+def : ReadAdvance<ReadSHXADD, 0>;
+def : ReadAdvance<ReadSHXADD32, 0>;
+def : ReadAdvance<ReadRotateReg, 0>;
+def : ReadAdvance<ReadRotateImm, 0>;
+def : ReadAdvance<ReadCLZ, 0>;
+def : ReadAdvance<ReadCTZ, 0>;
+def : ReadAdvance<ReadCPOP, 0>;
+def : ReadAdvance<ReadRotateReg32, 0>;
+def : ReadAdvance<ReadRotateImm32, 0>;
+def : ReadAdvance<ReadCLZ32, 0>;
+def : ReadAdvance<ReadCTZ32, 0>;
+def : ReadAdvance<ReadCPOP32, 0>;
+def : ReadAdvance<ReadREV8, 0>;
+def : ReadAdvance<ReadORCB, 0>;
+def : ReadAdvance<ReadIMul, 0>;
+def : ReadAdvance<ReadIMul32, 0>;
+def : ReadAdvance<ReadIDiv, 0>;
+def : ReadAdvance<ReadIDiv32, 0>;
+def : ReadAdvance<ReadJmp, 0>;
+def : ReadAdvance<ReadJalr, 0>;
+def : ReadAdvance<ReadFMovI32ToF32, 0>;
+def : ReadAdvance<ReadFMovF32ToI32, 0>;
+def : ReadAdvance<ReadFMovI64ToF64, 0>;
+def : ReadAdvance<ReadFMovF64ToI64, 0>;
+def : ReadAdvance<ReadFSGNJ32, 0>;
+def : ReadAdvance<ReadFMinMax32, 0>;
+def : ReadAdvance<ReadFSGNJ64, 0>;
+def : ReadAdvance<ReadFMinMax64, 0>;
+def : ReadAdvance<ReadFCmp32, 0>;
+def : ReadAdvance<ReadFCmp64, 0>;
+def : ReadAdvance<ReadFCvtI32ToF32, 0>;
+def : ReadAdvance<ReadFCvtI32ToF64, 0>;
+def : ReadAdvance<ReadFCvtI64ToF32, 0>;
+def : ReadAdvance<ReadFCvtI64ToF64, 0>;
+def : ReadAdvance<ReadFCvtF32ToI32, 0>;
+def : ReadAdvance<ReadFCvtF32ToI64, 0>;
+def : ReadAdvance<ReadFCvtF32ToF64, 0>;
+def : ReadAdvance<ReadFCvtF64ToI32, 0>;
+def : ReadAdvance<ReadFCvtF64ToI64, 0>;
+def : ReadAdvance<ReadFCvtF64ToF32, 0>;
+def : ReadAdvance<ReadFAdd32, 0>;
+def : ReadAdvance<ReadFAdd64, 0>;
+def : ReadAdvance<ReadFMul32, 0>;
+def : ReadAdvance<ReadFMul64, 0>;
+def : ReadAdvance<ReadFMA32, 0>;
+def : ReadAdvance<ReadFMA32Addend, 0>;
+def : ReadAdvance<ReadFMA64, 0>;
+def : ReadAdvance<ReadFMA64Addend, 0>;
+def : ReadAdvance<ReadFDiv32, 0>;
+def : ReadAdvance<ReadFSqrt32, 0>;
+def : ReadAdvance<ReadFDiv64, 0>;
+def : ReadAdvance<ReadFSqrt64, 0>;
+def : ReadAdvance<ReadAtomicWA, 0>;
+def : ReadAdvance<ReadAtomicWD, 0>;
+def : ReadAdvance<ReadAtomicDA, 0>;
+def : ReadAdvance<ReadAtomicDD, 0>;
+def : ReadAdvance<ReadAtomicLDW, 0>;
+def : ReadAdvance<ReadAtomicLDD, 0>;
+def : ReadAdvance<ReadAtomicSTW, 0>;
+def : ReadAdvance<ReadAtomicSTD, 0>;
+def : ReadAdvance<ReadFStoreData, 0>;
+def : ReadAdvance<ReadCSR, 0>;
+def : ReadAdvance<ReadMemBase, 0>;
+def : ReadAdvance<ReadStoreData, 0>;
+def : ReadAdvance<ReadFMemBase, 0>;
+def : ReadAdvance<ReadFClass32, 0>;
+def : ReadAdvance<ReadFClass64, 0>;
+def : ReadAdvance<ReadIMinMax, 0>;
+def : ReadAdvance<ReadIRem, 0>;
+def : ReadAdvance<ReadIRem32, 0>;
+
+// Unsupported extensions.
+defm : UnsupportedSchedV;
+defm : UnsupportedSchedZbc;
+defm : UnsupportedSchedZbs;
+defm : UnsupportedSchedZbkb;
+defm : UnsupportedSchedZbkx;
+defm : UnsupportedSchedZfa;
+defm : UnsupportedSchedZfhmin;
+defm : UnsupportedSchedSFB;
+defm : UnsupportedSchedZabha;
+defm : UnsupportedSchedXsfvcp;
+defm : UnsupportedSchedZvk;
+defm : UnsupportedSchedZvkned;
+}
diff --git a/llvm/test/tools/llvm-mca/RISCV/MIPS/p8700.s b/llvm/test/tools/llvm-mca/RISCV/MIPS/p8700.s
new file mode 100644
index 00000000000000..ca91f6bb970d8b
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/RISCV/MIPS/p8700.s
@@ -0,0 +1,143 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=riscv64 -mcpu=mips-p8700 -timeline -iterations=1 < %s | FileCheck %s
+
+# A few instructions to test the pipeline:
+# - Integer division (IDiv) exercises the p8700GpDiv resource.
+# - Integer multiplication (IMul) uses p8700GpMul.
+# - Floating-point multiplication uses the FPUL pipeline.
+# - Load/Store instructions use the LSU pipeline.
+# - Simple ALU instructions test the p8700WriteEitherALU and p8700IssueAL2 resources.
+# - A jump instruction to test the CTI pipeline.
+
+  .text
+  .globl _start
+_start:
+
+# Integer division: a0 = a1 / a2
+# Exercises p8700GpDiv resource.
+  div     a0, a1, a2
+
+# Integer multiplication: a4 = a1 * a2
+# Exercises p8700GpMul resource.
+  mul     a4, a1, a2
+
+# Floating-point multiply: f1 = f2 * f3 (single precision)
+# Exercises p8700FpuLong + p8700FpuApu resources.
+  fmul.s  f1, f2, f3
+
+# Load/Store: load word from a0 into a3, then store a3 into a1
+# Exercises p8700IssueLSU resource.
+  lw      a3, 0(a0)
+  sw      a3, 0(a1)
+
+# Simple ALU operations (adding two registers, rotating bits)
+# Exercises p8700WriteEitherALU.
+  add     a5, a1, a2
+  ror     a6, a5, a2
+
+# A jump instruction: a simple forward jump
+# Exercises p8700IssueCTI resource.
+  jal     x0, .Lend
+
+  add     a7, a4, a0  # Instruction after jump (won't execute)
+.Lend:
+  nop
+
+# CHECK:      Iterations:        1
+# CHECK-NEXT: Instructions:      10
+# CHECK-NEXT: Total Cycles:      17
+# CHECK-NEXT: Total uOps:        10
+
+# CHECK:      Dispatch Width:    4
+# CHECK-NEXT: uOps Per Cycle:    0.59
+# CHECK-NEXT: IPC:               0.59
+# CHECK-NEXT: Block RThroughput: 7.0
+
+# CHECK:      Instruction Info:
+# CHECK-NEXT: [1]: #uOps
+# CHECK-NEXT: [2]: Latency
+# CHECK-NEXT: [3]: RThroughput
+# CHECK-NEXT: [4]: MayLoad
+# CHECK-NEXT: [5]: MayStore
+# CHECK-NEXT: [6]: HasSideEffects (U)
+
+# CHECK:      [1]    [2]    [3]    [4]    [5]    [6]    Instructions:
+# CHECK-NEXT:  1      7     7.00                        div	a0, a1, a2
+# CHECK-NEXT:  1      4     1.00                        mul	a4, a1, a2
+# CHECK-NEXT:  1      5     1.00                        fmul.s	ft1, ft2, ft3
+# CHECK-NEXT:  1      4     1.00    *                   lw	a3, 0(a0)
+# CHECK-NEXT:  1      3     1.00           *            sw	a3, 0(a1)
+# CHECK-NEXT:  1      1     0.50                        add	a5, a1, a2
+# CHECK-NEXT:  1      1     0.50                        ror	a6, a5, a2
+# CHECK-NEXT:  1      1     1.00                        j	.Lend
+# CHECK-NEXT:  1      1     0.50                        add	a7, a4, a0
+# CHECK-NEXT:  1      0     0.50                        nop
+
+# CHECK:      Resources:
+# CHECK-NEXT: [0.0] - p8700AGQ
+# CHECK-NEXT: [0.1] - p8700AGQ
+# CHECK-NEXT: [0.2] - p8700AGQ
+# CHECK-NEXT: [1]   - p8700ALQ
+# CHECK-NEXT: [2.0] - p8700FPQ
+# CHECK-NEXT: [2.1] - p8700FPQ
+# CHECK-NEXT: [2.2] - p8700FPQ
+# CHECK-NEXT: [3]   - p8700FpuApu
+# CHECK-NEXT: [4]   - p8700FpuLong
+# CHECK-NEXT: [5]   - p8700GpDiv
+# CHECK-NEXT: [6]   - p8700GpMul
+# CHECK-NEXT: [7]   - p8700IssueAL2
+# CHECK-NEXT: [8]   - p8700IssueCTI
+# CHECK-NEXT: [9]   - p8700IssueFPUL
+# CHECK-NEXT: [10]  - p8700IssueFPUS
+# CHECK-NEXT: [11]  - p8700IssueLSU
+
+# CHECK:      Resource pressure per iteration:
+# CHECK-NEXT: [0.0]  [0.1]  [0.2]  [1]    [2.0]  [2.1]  [2.2]  [3]    [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]
+# CHECK-NEXT: 1.00   1.00   1.00   2.00    -      -      -     1.00   1.00   7.00   1.00   2.00   1.00    -      -     2.00
+
+# CHECK:      Resource pressure by instruction:
+# CHECK-NEXT: [0.0]  [0.1]  [0.2]  [1]    [2.0]  [2.1]  [2.2]  [3]    [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]   Instructions:
+# CHECK-NEXT:  -      -      -      -      -      -      -      -      -     7.00    -      -      -      -      -      -     div	a0, a1, a2
+# CHECK-NEXT:  -      -      -      -      -      -      -      -      -      -     1.00    -      -      -      -      -     mul	a4, a1, a2
+# CHECK-NEXT:  -      -      -      -      -      -      -     1.00   1.00    -      -      -      -      -      -      -     fmul.s	ft1, ft2, ft3
+# CHECK-NEXT:  -     1.00    -      -      -      -      -      -      -      -      -      -      -      -      -     1.00   lw	a3, 0(a0)
+# CHECK-NEXT: 1.00    -      -      -      -      -      -      -      -      -      -      -      -      -      -     1.00   sw	a3, 0(a1)
+# CHECK-NEXT:  -      -      -      -      -      -      -      -      -      -      -     1.00    -      -      -      -     add	a5, a1, a2
+# CHECK-NEXT:  -      -      -     1.00    -      -      -      -      -      -      -      -      -      -      -      -     ror	a6, a5, a2
+# CHECK-NEXT:  -      -     1.00    -      -      -      -      -      -      -      -      -     1.00    -      -      -     j	.Lend
+# CHECK-NEXT:  -      -      -     1.00    -      -      -      -      -      -      -      -      -      -      -      -     add	a7, a4, a0
+# CHECK-NEXT:  -      -      -      -      -      -      -      -      -      -      -     1.00    -      -      -      -     nop
+
+# CHECK:      Timeline view:
+# CHECK-NEXT:                     0123456
+# CHECK-NEXT: Index     0123456789
+
+# CHECK:      [0,0]     DeeeeeeeER.    ..   div	a0, a1, a2
+# CHECK-NEXT: [0,1]     DeeeeE---R.    ..   mul	a4, a1, a2
+# CHECK-NEXT: [0,2]     DeeeeeE--R.    ..   fmul.s	ft1, ft2, ft3
+# CHECK-NEXT: [0,3]     D=======eeeeER ..   lw	a3, 0(a0)
+# CHECK-NEXT: [0,4]     .D==========eeeER   sw	a3, 0(a1)
+# CHECK-NEXT: [0,5]     .DeE------------R   add	a5, a1, a2
+# CHECK-NEXT: [0,6]     .D=eE-----------R   ror	a6, a5, a2
+# CHECK-NEXT: [0,7]     .DeE------------R   j	.Lend
+# CHECK-NEXT: [0,8]     . D=====eE------R   add	a7, a4, a0
+# CHECK-NEXT: [0,9]     . DE------------R   nop
+
+# CHECK:      Average Wait times (based on the timeline view):
+# CHECK-NEXT: [0]: Executions
+# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
+# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
+# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
+
+# CHECK:            [0]    [1]    [2]    [3]
+# CHECK-NEXT: 0.     1     1.0    1.0    0.0       div	a0, a1, a2
+# CHECK-NEXT: 1.     1     1.0    1.0    3.0       mul	a4, a1, a2
+# CHECK-NEXT: 2.     1     1.0    1.0    2.0       fmul.s	ft1, ft2, ft3
+# CHECK-NEXT: 3.     1     8.0    0.0    0.0       lw	a3, 0(a0)
+# CHECK-NEXT: 4.     1     11.0   0.0    0.0       sw	a3, 0(a1)
+# CHECK-NEXT: 5.     1     1.0    1.0    12.0      add	a5, a1, a2
+# CHECK-NEXT: 6.     1     2.0    0.0    11.0      ror	a6, a5, a2
+# CHECK-NEXT: 7.     1     1.0    1.0    12.0      j	.Lend
+# CHECK-NEXT: 8.     1     6.0    0.0    6.0       add	a7, a4, a0
+# CHECK-NEXT: 9.     1     1.0    1.0    12.0      nop
+# CHECK-NEXT:        1     3.3    0.6    5.8       <total>

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and please fill the description.


.text
.globl _start
_start:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you don't need this in a MCA test. :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Thank you!

The tests in the LLVM project keep growing larger, and while we usually trim them manually or use llvm-reduce, it would be very helpful to have a dedicated tool or utility that can automatically remove unnecessary parts from tests in general.

@djtodoro djtodoro merged commit 3222060 into llvm:main Dec 19, 2024
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants