-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[MCA] Extend -instruction-tables option with verbosity levels #130574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Outputs micro ops, latency, bypass latency, throughput, llvm opcode name, used resources and parsed assembly instruction with comments. This option is used better check scheduling info when TableGen is modified. Information like Throughput (not reverse throughput) and bypass latency help to compare with micro architecture documentation. LLVM Opcode name help to find right instruction regexp to fix TableGen Scheduling Info. Example: Input: abs D20, D11 // ABS <V><d>, <V><n> \\ ASIMD arith, basic \\ 1 2 2 4.0 V1UnitV Output: 1 2 2 4.00 V1UnitV ABSv1i64 abs d20, d11 // ABS <V><d>, <V><n> \\ ASIMD arith, basic \\ 1 2 2 4.0 V1UnitV
@llvm/pr-subscribers-tools-llvm-mca @llvm/pr-subscribers-mc Author: Julien Villette (jvillette38) ChangesOutputs micro ops, latency, bypass latency, throughput, llvm opcode name, used resources and parsed assembly instruction with comments if any. This option is used better check scheduling info when TableGen is modified. Information like Throughput (not reverse throughput) and bypass latency help to compare with micro architecture documentation. LLVM Opcode name help to find right instruction regexp to fix TableGen Scheduling Info. Example: Follow up of MR #126703
So when -scheduling-info option is defined, cannot show barriers and ecoding. Scheduling info option is validated on AArch64/Neoverse/V1-sve-instructions.s Patch is 1.07 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/130574.diff 7 Files Affected:
diff --git a/llvm/docs/CommandGuide/llvm-mca.rst b/llvm/docs/CommandGuide/llvm-mca.rst
index f610ea2f21682..5c945907785ac 100644
--- a/llvm/docs/CommandGuide/llvm-mca.rst
+++ b/llvm/docs/CommandGuide/llvm-mca.rst
@@ -170,6 +170,20 @@ option specifies "``-``", then the output will also be sent to standard output.
Enable extra scheduler statistics. This view collects and analyzes instruction
issue events. This view is disabled by default.
+.. option:: -scheduling-info
+
+ Enable scheduling info view. This view reports scheduling information defined
+ in LLVM target description in the form:
+ uOps | Latency | Bypass Latency | Throughput | LLVM OpcodeName | Resources
+ units | assembly instruction and its comment (// or /* */) if defined.
+ It allows to compare scheduling info with architecture documents and fix them
+ in target description by fixing InstrRW for the reported LLVM opcode.
+ Scheduling information can be defined in the same order in each instruction
+ comments to check easily reported and reference scheduling information.
+ Suggested information in comment:
+ ``// <architecture instruction form> \\ <scheduling documentation title> \\
+ <uOps>, <Latency>, <Bypass Latency>, <Throughput>, <Resources units>``
+
.. option:: -retire-stats
Enable extra retire control unit statistics. This view is disabled by default.
diff --git a/llvm/include/llvm/MC/MCSchedule.h b/llvm/include/llvm/MC/MCSchedule.h
index fe731d086f70a..57c8ebeee02a7 100644
--- a/llvm/include/llvm/MC/MCSchedule.h
+++ b/llvm/include/llvm/MC/MCSchedule.h
@@ -402,6 +402,10 @@ struct MCSchedModel {
static unsigned getForwardingDelayCycles(ArrayRef<MCReadAdvanceEntry> Entries,
unsigned WriteResourceIdx = 0);
+ /// Returns the bypass delay cycle for the maximum latency write cycle
+ static unsigned getBypassDelayCycles(const MCSubtargetInfo &STI,
+ const MCSchedClassDesc &SCDesc);
+
/// Returns the default initialized model.
static const MCSchedModel Default;
};
diff --git a/llvm/lib/MC/MCSchedule.cpp b/llvm/lib/MC/MCSchedule.cpp
index ed243cecabb76..21f5a1d62fc9d 100644
--- a/llvm/lib/MC/MCSchedule.cpp
+++ b/llvm/lib/MC/MCSchedule.cpp
@@ -174,3 +174,39 @@ MCSchedModel::getForwardingDelayCycles(ArrayRef<MCReadAdvanceEntry> Entries,
return std::abs(DelayCycles);
}
+
+unsigned MCSchedModel::getBypassDelayCycles(const MCSubtargetInfo &STI,
+ const MCSchedClassDesc &SCDesc) {
+
+ ArrayRef<MCReadAdvanceEntry> Entries = STI.getReadAdvanceEntries(SCDesc);
+ if (Entries.empty())
+ return 0;
+
+ unsigned Latency = 0;
+ unsigned MaxLatency = 0;
+ unsigned WriteResourceID = 0;
+ unsigned DefEnd = SCDesc.NumWriteLatencyEntries;
+
+ for (unsigned DefIdx = 0; DefIdx != DefEnd; ++DefIdx) {
+ // Lookup the definition's write latency in SubtargetInfo.
+ const MCWriteLatencyEntry *WLEntry =
+ STI.getWriteLatencyEntry(&SCDesc, DefIdx);
+ unsigned Cycles = (unsigned)WLEntry->Cycles;
+ // Invalid latency. Consider 0 cycle latency
+ if (WLEntry->Cycles < 0)
+ Cycles = 0;
+ if (Cycles > Latency) {
+ MaxLatency = Cycles;
+ WriteResourceID = WLEntry->WriteResourceID;
+ }
+ Latency = MaxLatency;
+ }
+
+ for (const MCReadAdvanceEntry &E : Entries) {
+ if (E.WriteResourceID == WriteResourceID)
+ return E.Cycles;
+ }
+
+ // Unable to find WriteResourceID in MCReadAdvanceEntry Entries
+ return 0;
+}
diff --git a/llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-sve-instructions.s b/llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-sve-instructions.s
index bcbc5eecd924b..e1f3ca127d5bd 100644
--- a/llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-sve-instructions.s
+++ b/llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-sve-instructions.s
@@ -1,5 +1,5 @@
# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
-# RUN: llvm-mca -mtriple=aarch64 -mcpu=neoverse-v1 -instruction-tables < %s | FileCheck %s
+# RUN: llvm-mca -mtriple=aarch64 -mcpu=neoverse-v1 -scheduling-info < %s | FileCheck %s
abs z0.b, p0/m, z0.b
abs z0.d, p0/m, z0.d
@@ -2467,2480 +2467,2513 @@ zip2 z31.d, z31.d, z31.d
zip2 z31.h, z31.h, z31.h
zip2 z31.s, z31.s, z31.s
-# CHECK: Instruction Info:
+# CHECK: Iterations: 100
+# CHECK-NEXT: Instructions: 246500
+# CHECK-NEXT: Total Cycles: 332424
+# CHECK-NEXT: Total uOps: 457700
+
+# CHECK: Dispatch Width: 15
+# CHECK-NEXT: uOps Per Cycle: 1.38
+# CHECK-NEXT: IPC: 0.74
+# CHECK-NEXT: Block RThroughput: 802.0
+
+# CHECK: Resources:
+# CHECK-NEXT: [0] - V1UnitB:2
+# CHECK-NEXT: [1] - V1UnitD:2
+# CHECK-NEXT: [2] - V1UnitFlg:3
+# CHECK-NEXT: [3] - V1UnitI:4 V1UnitS, V1UnitS, V1UnitM0, V1UnitM1
+# CHECK-NEXT: [4] - V1UnitL:3 V1UnitL01, V1UnitL01, V1UnitL2
+# CHECK-NEXT: [5] - V1UnitL2:1
+# CHECK-NEXT: [6] - V1UnitL01:2
+# CHECK-NEXT: [7] - V1UnitM:2 V1UnitM0, V1UnitM1
+# CHECK-NEXT: [8] - V1UnitM0:1
+# CHECK-NEXT: [9] - V1UnitM1:1
+# CHECK-NEXT: [10] - V1UnitS:2
+# CHECK-NEXT: [11] - V1UnitV:4 V1UnitV0, V1UnitV1, V1UnitV2, V1UnitV3
+# CHECK-NEXT: [12] - V1UnitV0:1
+# CHECK-NEXT: [13] - V1UnitV1:1
+# CHECK-NEXT: [14] - V1UnitV2:1
+# CHECK-NEXT: [15] - V1UnitV3:1
+# CHECK-NEXT: [16] - V1UnitV01:2 V1UnitV0, V1UnitV1
+# CHECK-NEXT: [17] - V1UnitV02:2 V1UnitV0, V1UnitV2
+# CHECK-NEXT: [18] - V1UnitV13:2 V1UnitV1, V1UnitV3
+
+# CHECK: Scheduling Info:
# CHECK-NEXT: [1]: #uOps
# CHECK-NEXT: [2]: Latency
-# CHECK-NEXT: [3]: RThroughput
-# CHECK-NEXT: [4]: MayLoad
-# CHECK-NEXT: [5]: MayStore
-# CHECK-NEXT: [6]: HasSideEffects (U)
+# CHECK-NEXT: [3]: Bypass Latency
+# CHECK-NEXT: [4]: Throughput
+# CHECK-NEXT: [5]: Resources
+# CHECK-NEXT: [6]: LLVM OpcodeName
+# CHECK-NEXT: [7]: Instruction
+# CHECK-NEXT: [8]: Comment if any
-# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
-# CHECK-NEXT: 1 2 0.50 abs z0.b, p0/m, z0.b
-# CHECK-NEXT: 1 2 0.50 abs z0.d, p0/m, z0.d
-# CHECK-NEXT: 1 2 0.50 abs z0.h, p0/m, z0.h
-# CHECK-NEXT: 1 2 0.50 abs z0.s, p0/m, z0.s
-# CHECK-NEXT: 1 2 0.50 abs z31.b, p7/m, z31.b
-# CHECK-NEXT: 1 2 0.50 abs z31.d, p7/m, z31.d
-# CHECK-NEXT: 1 2 0.50 abs z31.h, p7/m, z31.h
-# CHECK-NEXT: 1 2 0.50 abs z31.s, p7/m, z31.s
-# CHECK-NEXT: 1 2 0.50 add z0.b, p0/m, z0.b, z0.b
-# CHECK-NEXT: 1 2 0.50 add z0.b, z0.b, #0
-# CHECK-NEXT: 1 2 0.50 add z0.b, z0.b, z0.b
-# CHECK-NEXT: 1 2 0.50 add z0.d, p0/m, z0.d, z0.d
-# CHECK-NEXT: 1 2 0.50 add z0.d, z0.d, #0
-# CHECK-NEXT: 1 2 0.50 add z0.d, z0.d, #0, lsl #8
-# CHECK-NEXT: 1 2 0.50 add z0.d, z0.d, z0.d
-# CHECK-NEXT: 1 2 0.50 add z0.h, p0/m, z0.h, z0.h
-# CHECK-NEXT: 1 2 0.50 add z0.h, z0.h, #0
-# CHECK-NEXT: 1 2 0.50 add z0.h, z0.h, #0, lsl #8
-# CHECK-NEXT: 1 2 0.50 add z0.h, z0.h, z0.h
-# CHECK-NEXT: 1 2 0.50 add z0.s, p0/m, z0.s, z0.s
-# CHECK-NEXT: 1 2 0.50 add z0.s, z0.s, #0
-# CHECK-NEXT: 1 2 0.50 add z0.s, z0.s, #0, lsl #8
-# CHECK-NEXT: 1 2 0.50 add z0.s, z0.s, z0.s
-# CHECK-NEXT: 1 2 0.50 add z0.s, z1.s, z2.s
-# CHECK-NEXT: 1 2 0.50 add z21.b, p5/m, z21.b, z10.b
-# CHECK-NEXT: 1 2 0.50 add z21.b, z10.b, z21.b
-# CHECK-NEXT: 1 2 0.50 add z21.d, p5/m, z21.d, z10.d
-# CHECK-NEXT: 1 2 0.50 add z21.d, z10.d, z21.d
-# CHECK-NEXT: 1 2 0.50 add z21.h, p5/m, z21.h, z10.h
-# CHECK-NEXT: 1 2 0.50 add z21.h, z10.h, z21.h
-# CHECK-NEXT: 1 2 0.50 add z21.s, p5/m, z21.s, z10.s
-# CHECK-NEXT: 1 2 0.50 add z21.s, z10.s, z21.s
-# CHECK-NEXT: 1 2 0.50 add z23.b, p3/m, z23.b, z13.b
-# CHECK-NEXT: 1 2 0.50 add z23.b, z13.b, z8.b
-# CHECK-NEXT: 1 2 0.50 add z23.d, p3/m, z23.d, z13.d
-# CHECK-NEXT: 1 2 0.50 add z23.d, z13.d, z8.d
-# CHECK-NEXT: 1 2 0.50 add z23.h, p3/m, z23.h, z13.h
-# CHECK-NEXT: 1 2 0.50 add z23.h, z13.h, z8.h
-# CHECK-NEXT: 1 2 0.50 add z23.s, p3/m, z23.s, z13.s
-# CHECK-NEXT: 1 2 0.50 add z23.s, z13.s, z8.s
-# CHECK-NEXT: 1 2 0.50 add z31.b, p7/m, z31.b, z31.b
-# CHECK-NEXT: 1 2 0.50 add z31.b, z31.b, #255
-# CHECK-NEXT: 1 2 0.50 add z31.b, z31.b, z31.b
-# CHECK-NEXT: 1 2 0.50 add z31.d, p7/m, z31.d, z31.d
-# CHECK-NEXT: 1 2 0.50 add z31.d, z31.d, #65280
-# CHECK-NEXT: 1 2 0.50 add z31.d, z31.d, z31.d
-# CHECK-NEXT: 1 2 0.50 add z31.h, p7/m, z31.h, z31.h
-# CHECK-NEXT: 1 2 0.50 add z31.h, z31.h, #65280
-# CHECK-NEXT: 1 2 0.50 add z31.h, z31.h, z31.h
-# CHECK-NEXT: 1 2 0.50 add z31.s, p7/m, z31.s, z31.s
-# CHECK-NEXT: 1 2 0.50 add z31.s, z31.s, #65280
-# CHECK-NEXT: 1 2 0.50 add z31.s, z31.s, z31.s
-# CHECK-NEXT: 1 2 1.00 addpl sp, sp, #31
-# CHECK-NEXT: 1 2 1.00 addpl x0, x0, #-32
-# CHECK-NEXT: 1 2 1.00 addpl x21, x21, #0
-# CHECK-NEXT: 1 2 1.00 addpl x23, x8, #-1
-# CHECK-NEXT: 1 2 1.00 addvl sp, sp, #31
-# CHECK-NEXT: 1 2 1.00 addvl x0, x0, #-32
-# CHECK-NEXT: 1 2 1.00 addvl x21, x21, #0
-# CHECK-NEXT: 1 2 1.00 addvl x23, x8, #-1
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, lsl #1]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, lsl #2]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, lsl #3]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, sxtw #1]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, sxtw #2]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, sxtw #3]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, sxtw]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, uxtw #1]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, uxtw #2]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, uxtw #3]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, uxtw]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d]
-# CHECK-NEXT: 1 2 0.50 adr z0.s, [z0.s, z0.s, lsl #1]
-# CHECK-NEXT: 1 2 0.50 adr z0.s, [z0.s, z0.s, lsl #2]
-# CHECK-NEXT: 1 2 0.50 adr z0.s, [z0.s, z0.s, lsl #3]
-# CHECK-NEXT: 1 2 0.50 adr z0.s, [z0.s, z0.s]
-# CHECK-NEXT: 1 1 1.00 and p0.b, p0/z, p0.b, p1.b
-# CHECK-NEXT: 1 2 0.50 and z0.d, z0.d, #0x6
-# CHECK-NEXT: 1 2 0.50 and z0.d, z0.d, #0xfffffffffffffff9
-# CHECK-NEXT: 1 2 0.50 and z0.d, z0.d, z0.d
-# CHECK-NEXT: 1 2 0.50 and z0.s, z0.s, #0x6
-# CHECK-NEXT: 1 2 0.50 and z0.s, z0.s, #0xfffffff9
-# CHECK-NEXT: 1 2 0.50 and z23.d, z13.d, z8.d
-# CHECK-NEXT: 1 2 0.50 and z23.h, z23.h, #0x6
-# CHECK-NEXT: 1 2 0.50 and z23.h, z23.h, #0xfff9
-# CHECK-NEXT: 1 2 0.50 and z31.b, p7/m, z31.b, z31.b
-# CHECK-NEXT: 1 2 0.50 and z31.d, p7/m, z31.d, z31.d
-# CHECK-NEXT: 1 2 0.50 and z31.h, p7/m, z31.h, z31.h
-# CHECK-NEXT: 1 2 0.50 and z31.s, p7/m, z31.s, z31.s
-# CHECK-NEXT: 1 2 0.50 and z5.b, z5.b, #0x6
-# CHECK-NEXT: 1 2 0.50 and z5.b, z5.b, #0xf9
-# CHECK-NEXT: 2 2 2.00 ands p0.b, p0/z, p0.b, p1.b
-# CHECK-NEXT: 4 12 2.00 andv b0, p7, z31.b
-# CHECK-NEXT: 4 12 2.00 andv d0, p7, z31.d
-# CHECK-NEXT: 4 12 2.00 andv h0, p7, z31.h
-# CHECK-NEXT: 4 12 2.00 andv s0, p7, z31.s
-# CHECK-NEXT: 1 2 1.00 asr z0.b, p0/m, z0.b, #1
-# CHECK-NEXT: 1 2 1.00 asr z0.b, p0/m, z0.b, z0.b
-# CHECK-NEXT: 1 2 1.00 asr z0.b, p0/m, z0.b, z1.d
-# CHECK-NEXT: 1 2 1.00 asr z0.b, z0.b, #1
-# CHECK-NEXT: 1 2 1.00 asr z0.b, z1.b, z2.d
-# CHECK-NEXT: 1 2 1.00 asr z0.d, p0/m, z0.d, #1
-# CHECK-NEXT: 1 2 1.00 asr z0.d, p0/m, z0.d, z0.d
-# CHECK-NEXT: 1 2 1.00 asr z0.d, z0.d, #1
-# CHECK-NEXT: 1 2 1.00 asr z0.h, p0/m, z0.h, #1
-# CHECK-NEXT: 1 2 1.00 asr z0.h, p0/m, z0.h, z0.h
-# CHECK-NEXT: 1 2 1.00 asr z0.h, p0/m, z0.h, z1.d
-# CHECK-NEXT: 1 2 1.00 asr z0.h, z0.h, #1
-# CHECK-NEXT: 1 2 1.00 asr z0.h, z1.h, z2.d
-# CHECK-NEXT: 1 2 1.00 asr z0.s, p0/m, z0.s, #1
-# CHECK-NEXT: 1 2 1.00 asr z0.s, p0/m, z0.s, z0.s
-# CHECK-NEXT: 1 2 1.00 asr z0.s, p0/m, z0.s, z1.d
-# CHECK-NEXT: 1 2 1.00 asr z0.s, z0.s, #1
-# CHECK-NEXT: 1 2 1.00 asr z0.s, z1.s, z2.d
-# CHECK-NEXT: 1 2 1.00 asr z31.b, p0/m, z31.b, #8
-# CHECK-NEXT: 1 2 1.00 asr z31.b, z31.b, #8
-# CHECK-NEXT: 1 2 1.00 asr z31.d, p0/m, z31.d, #64
-# CHECK-NEXT: 1 2 1.00 asr z31.d, z31.d, #64
-# CHECK-NEXT: 1 2 1.00 asr z31.h, p0/m, z31.h, #16
-# CHECK-NEXT: 1 2 1.00 asr z31.h, z31.h, #16
-# CHECK-NEXT: 1 2 1.00 asr z31.s, p0/m, z31.s, #32
-# CHECK-NEXT: 1 2 1.00 asr z31.s, z31.s, #32
-# CHECK-NEXT: 1 4 1.00 asrd z0.b, p0/m, z0.b, #1
-# CHECK-NEXT: 1 4 1.00 asrd z0.d, p0/m, z0.d, #1
-# CHECK-NEXT: 1 4 1.00 asrd z0.h, p0/m, z0.h, #1
-# CHECK-NEXT: 1 4 1.00 asrd z0.s, p0/m, z0.s, #1
-# CHECK-NEXT: 1 4 1.00 asrd z31.b, p0/m, z31.b, #8
-# CHECK-NEXT: 1 4 1.00 asrd z31.d, p0/m, z31.d, #64
-# CHECK-NEXT: 1 4 1.00 asrd z31.h, p0/m, z31.h, #16
-# CHECK-NEXT: 1 4 1.00 asrd z31.s, p0/m, z31.s, #32
-# CHECK-NEXT: 1 2 1.00 asrr z0.b, p0/m, z0.b, z0.b
-# CHECK-NEXT: 1 2 1.00 asrr z0.d, p0/m, z0.d, z0.d
-# CHECK-NEXT: 1 2 1.00 asrr z0.h, p0/m, z0.h, z0.h
-# CHECK-NEXT: 1 2 1.00 asrr z0.s, p0/m, z0.s, z0.s
-# CHECK-NEXT: 1 4 1.00 bfcvt z0.h, p0/m, z1.s
-# CHECK-NEXT: 1 4 1.00 bfcvtnt z0.h, p0/m, z1.s
-# CHECK-NEXT: 1 4 0.50 bfdot z0.s, z1.h, z2.h
-# CHECK-NEXT: 1 4 0.50 bfdot z0.s, z1.h, z2.h[0]
-# CHECK-NEXT: 1 4 0.50 bfdot z0.s, z1.h, z2.h[3]
-# CHECK-NEXT: 1 5 0.50 bfmlalb z0.s, z1.h, z2.h
-# CHECK-NEXT: 1 5 0.50 bfmlalb z0.s, z1.h, z2.h[0]
-# CHECK-NEXT: 1 5 0.50 bfmlalb z0.s, z1.h, z2.h[7]
-# CHECK-NEXT: 1 5 0.50 bfmlalb z10.s, z21.h, z14.h
-# CHECK-NEXT: 1 5 0.50 bfmlalb z21.s, z14.h, z3.h[2]
-# CHECK-NEXT: 1 5 0.50 bfmlalt z0.s, z1.h, z2.h
-# CHECK-NEXT: 1 5 0.50 bfmlalt z0.s, z1.h, z2.h[0]
-# CHECK-NEXT: 1 5 0.50 bfmlalt z0.s, z1.h, z2.h[7]
-# CHECK-NEXT: 1 5 0.50 bfmlalt z0.s, z1.h, z7.h[7]
-# CHECK-NEXT: 1 5 0.50 bfmlalt z14.s, z10.h, z21.h
-# CHECK-NEXT: 1 5 0.50 bfmmla z0.s, z1.h, z2.h
-# CHECK-NEXT: 1 1 1.00 bic p0.b, p0/z, p0.b, p0.b
-# CHECK-NEXT: 1 1 1.00 bic p15.b, p15/z, p15.b, p15.b
-# CHECK-NEXT: 1 2 0.50 bic z0.d, z0.d, z0.d
-# CHECK-NEXT: 1 2 0.50 bic z23.d, z13.d, z8.d
-# CHECK-NEXT: 1 2 0.50 bic z31.b, p7/m, z31.b, z31.b
-# CHECK-NEXT: 1 2 0.50 bic z31.d, p7/m, z31.d, z31.d
-# CHECK-NEXT: 1 2 0.50 bic z31.h, p7/m, z31.h, z31.h
-# CHECK-NEXT: 1 2 0.50 bic z31.s, p7/m, z31.s, z31.s
-# CHECK-NEXT: 2 2 2.00 bics p0.b, p0/z, p0.b, p0.b
-# CHECK-NEXT: 2 2 2.00 bics p15.b, p15/z, p15.b, p15.b
-# CHECK-NEXT: 1 2 1.00 brka p0.b, p15/m, p15.b
-# CHECK-NEXT: 1 2 1.00 brka p0.b, p15/z, p15.b
-# CHECK-NEXT: 2 3 2.00 brkas p0.b, p15/z, p15.b
-# CHECK-NEXT: 1 2 1.00 brkb p0.b, p15/m, p15.b
-# CHECK-NEXT: 1 2 1.00 brkb p0.b, p15/z, p15.b
-# CHECK-NEXT: 2 3 2.00 brkbs p0.b, p15/z, p15.b
-# CHECK-NEXT: 1 2 1.00 brkn p0.b, p15/z, p1.b, p0.b
-# CHECK-NEXT: 1 2 1.00 brkn p15.b, p15/z, p15.b, p15.b
-# CHECK-NEXT: 2 3 2.00 brkns p0.b, p15/z, p1.b, p0.b
-# CHECK-NEXT: 2 3 2.00 brkns p15.b, p15/z, p15.b, p15.b
-# CHECK-NEXT: 1 2 1.00 brkpa p0.b, p15/z, p1.b, p2.b
-# CHECK-NEXT: 1 2 1.00 brkpa p15.b, p15/z, p15.b, p15.b
-# CHECK-NEXT: 2 3 2.00 brkpas p0.b, p15/z, p1.b, p2.b
-# CHECK-NEXT: 2 3 2.00 brkpas p15.b, p15/z, p15.b, p15.b
-# CHECK-NEXT: 1 2 1.00 brkpb p0.b, p15/z, p1.b, p2.b
-# CHECK-N...
[truncated]
|
@llvm/pr-subscribers-llvm-binary-utilities Author: Julien Villette (jvillette38) ChangesOutputs micro ops, latency, bypass latency, throughput, llvm opcode name, used resources and parsed assembly instruction with comments if any. This option is used better check scheduling info when TableGen is modified. Information like Throughput (not reverse throughput) and bypass latency help to compare with micro architecture documentation. LLVM Opcode name help to find right instruction regexp to fix TableGen Scheduling Info. Example: Follow up of MR #126703
So when -scheduling-info option is defined, cannot show barriers and ecoding. Scheduling info option is validated on AArch64/Neoverse/V1-sve-instructions.s Patch is 1.07 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/130574.diff 7 Files Affected:
diff --git a/llvm/docs/CommandGuide/llvm-mca.rst b/llvm/docs/CommandGuide/llvm-mca.rst
index f610ea2f21682..5c945907785ac 100644
--- a/llvm/docs/CommandGuide/llvm-mca.rst
+++ b/llvm/docs/CommandGuide/llvm-mca.rst
@@ -170,6 +170,20 @@ option specifies "``-``", then the output will also be sent to standard output.
Enable extra scheduler statistics. This view collects and analyzes instruction
issue events. This view is disabled by default.
+.. option:: -scheduling-info
+
+ Enable scheduling info view. This view reports scheduling information defined
+ in LLVM target description in the form:
+ uOps | Latency | Bypass Latency | Throughput | LLVM OpcodeName | Resources
+ units | assembly instruction and its comment (// or /* */) if defined.
+ It allows to compare scheduling info with architecture documents and fix them
+ in target description by fixing InstrRW for the reported LLVM opcode.
+ Scheduling information can be defined in the same order in each instruction
+ comments to check easily reported and reference scheduling information.
+ Suggested information in comment:
+ ``// <architecture instruction form> \\ <scheduling documentation title> \\
+ <uOps>, <Latency>, <Bypass Latency>, <Throughput>, <Resources units>``
+
.. option:: -retire-stats
Enable extra retire control unit statistics. This view is disabled by default.
diff --git a/llvm/include/llvm/MC/MCSchedule.h b/llvm/include/llvm/MC/MCSchedule.h
index fe731d086f70a..57c8ebeee02a7 100644
--- a/llvm/include/llvm/MC/MCSchedule.h
+++ b/llvm/include/llvm/MC/MCSchedule.h
@@ -402,6 +402,10 @@ struct MCSchedModel {
static unsigned getForwardingDelayCycles(ArrayRef<MCReadAdvanceEntry> Entries,
unsigned WriteResourceIdx = 0);
+ /// Returns the bypass delay cycle for the maximum latency write cycle
+ static unsigned getBypassDelayCycles(const MCSubtargetInfo &STI,
+ const MCSchedClassDesc &SCDesc);
+
/// Returns the default initialized model.
static const MCSchedModel Default;
};
diff --git a/llvm/lib/MC/MCSchedule.cpp b/llvm/lib/MC/MCSchedule.cpp
index ed243cecabb76..21f5a1d62fc9d 100644
--- a/llvm/lib/MC/MCSchedule.cpp
+++ b/llvm/lib/MC/MCSchedule.cpp
@@ -174,3 +174,39 @@ MCSchedModel::getForwardingDelayCycles(ArrayRef<MCReadAdvanceEntry> Entries,
return std::abs(DelayCycles);
}
+
+unsigned MCSchedModel::getBypassDelayCycles(const MCSubtargetInfo &STI,
+ const MCSchedClassDesc &SCDesc) {
+
+ ArrayRef<MCReadAdvanceEntry> Entries = STI.getReadAdvanceEntries(SCDesc);
+ if (Entries.empty())
+ return 0;
+
+ unsigned Latency = 0;
+ unsigned MaxLatency = 0;
+ unsigned WriteResourceID = 0;
+ unsigned DefEnd = SCDesc.NumWriteLatencyEntries;
+
+ for (unsigned DefIdx = 0; DefIdx != DefEnd; ++DefIdx) {
+ // Lookup the definition's write latency in SubtargetInfo.
+ const MCWriteLatencyEntry *WLEntry =
+ STI.getWriteLatencyEntry(&SCDesc, DefIdx);
+ unsigned Cycles = (unsigned)WLEntry->Cycles;
+ // Invalid latency. Consider 0 cycle latency
+ if (WLEntry->Cycles < 0)
+ Cycles = 0;
+ if (Cycles > Latency) {
+ MaxLatency = Cycles;
+ WriteResourceID = WLEntry->WriteResourceID;
+ }
+ Latency = MaxLatency;
+ }
+
+ for (const MCReadAdvanceEntry &E : Entries) {
+ if (E.WriteResourceID == WriteResourceID)
+ return E.Cycles;
+ }
+
+ // Unable to find WriteResourceID in MCReadAdvanceEntry Entries
+ return 0;
+}
diff --git a/llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-sve-instructions.s b/llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-sve-instructions.s
index bcbc5eecd924b..e1f3ca127d5bd 100644
--- a/llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-sve-instructions.s
+++ b/llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-sve-instructions.s
@@ -1,5 +1,5 @@
# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
-# RUN: llvm-mca -mtriple=aarch64 -mcpu=neoverse-v1 -instruction-tables < %s | FileCheck %s
+# RUN: llvm-mca -mtriple=aarch64 -mcpu=neoverse-v1 -scheduling-info < %s | FileCheck %s
abs z0.b, p0/m, z0.b
abs z0.d, p0/m, z0.d
@@ -2467,2480 +2467,2513 @@ zip2 z31.d, z31.d, z31.d
zip2 z31.h, z31.h, z31.h
zip2 z31.s, z31.s, z31.s
-# CHECK: Instruction Info:
+# CHECK: Iterations: 100
+# CHECK-NEXT: Instructions: 246500
+# CHECK-NEXT: Total Cycles: 332424
+# CHECK-NEXT: Total uOps: 457700
+
+# CHECK: Dispatch Width: 15
+# CHECK-NEXT: uOps Per Cycle: 1.38
+# CHECK-NEXT: IPC: 0.74
+# CHECK-NEXT: Block RThroughput: 802.0
+
+# CHECK: Resources:
+# CHECK-NEXT: [0] - V1UnitB:2
+# CHECK-NEXT: [1] - V1UnitD:2
+# CHECK-NEXT: [2] - V1UnitFlg:3
+# CHECK-NEXT: [3] - V1UnitI:4 V1UnitS, V1UnitS, V1UnitM0, V1UnitM1
+# CHECK-NEXT: [4] - V1UnitL:3 V1UnitL01, V1UnitL01, V1UnitL2
+# CHECK-NEXT: [5] - V1UnitL2:1
+# CHECK-NEXT: [6] - V1UnitL01:2
+# CHECK-NEXT: [7] - V1UnitM:2 V1UnitM0, V1UnitM1
+# CHECK-NEXT: [8] - V1UnitM0:1
+# CHECK-NEXT: [9] - V1UnitM1:1
+# CHECK-NEXT: [10] - V1UnitS:2
+# CHECK-NEXT: [11] - V1UnitV:4 V1UnitV0, V1UnitV1, V1UnitV2, V1UnitV3
+# CHECK-NEXT: [12] - V1UnitV0:1
+# CHECK-NEXT: [13] - V1UnitV1:1
+# CHECK-NEXT: [14] - V1UnitV2:1
+# CHECK-NEXT: [15] - V1UnitV3:1
+# CHECK-NEXT: [16] - V1UnitV01:2 V1UnitV0, V1UnitV1
+# CHECK-NEXT: [17] - V1UnitV02:2 V1UnitV0, V1UnitV2
+# CHECK-NEXT: [18] - V1UnitV13:2 V1UnitV1, V1UnitV3
+
+# CHECK: Scheduling Info:
# CHECK-NEXT: [1]: #uOps
# CHECK-NEXT: [2]: Latency
-# CHECK-NEXT: [3]: RThroughput
-# CHECK-NEXT: [4]: MayLoad
-# CHECK-NEXT: [5]: MayStore
-# CHECK-NEXT: [6]: HasSideEffects (U)
+# CHECK-NEXT: [3]: Bypass Latency
+# CHECK-NEXT: [4]: Throughput
+# CHECK-NEXT: [5]: Resources
+# CHECK-NEXT: [6]: LLVM OpcodeName
+# CHECK-NEXT: [7]: Instruction
+# CHECK-NEXT: [8]: Comment if any
-# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
-# CHECK-NEXT: 1 2 0.50 abs z0.b, p0/m, z0.b
-# CHECK-NEXT: 1 2 0.50 abs z0.d, p0/m, z0.d
-# CHECK-NEXT: 1 2 0.50 abs z0.h, p0/m, z0.h
-# CHECK-NEXT: 1 2 0.50 abs z0.s, p0/m, z0.s
-# CHECK-NEXT: 1 2 0.50 abs z31.b, p7/m, z31.b
-# CHECK-NEXT: 1 2 0.50 abs z31.d, p7/m, z31.d
-# CHECK-NEXT: 1 2 0.50 abs z31.h, p7/m, z31.h
-# CHECK-NEXT: 1 2 0.50 abs z31.s, p7/m, z31.s
-# CHECK-NEXT: 1 2 0.50 add z0.b, p0/m, z0.b, z0.b
-# CHECK-NEXT: 1 2 0.50 add z0.b, z0.b, #0
-# CHECK-NEXT: 1 2 0.50 add z0.b, z0.b, z0.b
-# CHECK-NEXT: 1 2 0.50 add z0.d, p0/m, z0.d, z0.d
-# CHECK-NEXT: 1 2 0.50 add z0.d, z0.d, #0
-# CHECK-NEXT: 1 2 0.50 add z0.d, z0.d, #0, lsl #8
-# CHECK-NEXT: 1 2 0.50 add z0.d, z0.d, z0.d
-# CHECK-NEXT: 1 2 0.50 add z0.h, p0/m, z0.h, z0.h
-# CHECK-NEXT: 1 2 0.50 add z0.h, z0.h, #0
-# CHECK-NEXT: 1 2 0.50 add z0.h, z0.h, #0, lsl #8
-# CHECK-NEXT: 1 2 0.50 add z0.h, z0.h, z0.h
-# CHECK-NEXT: 1 2 0.50 add z0.s, p0/m, z0.s, z0.s
-# CHECK-NEXT: 1 2 0.50 add z0.s, z0.s, #0
-# CHECK-NEXT: 1 2 0.50 add z0.s, z0.s, #0, lsl #8
-# CHECK-NEXT: 1 2 0.50 add z0.s, z0.s, z0.s
-# CHECK-NEXT: 1 2 0.50 add z0.s, z1.s, z2.s
-# CHECK-NEXT: 1 2 0.50 add z21.b, p5/m, z21.b, z10.b
-# CHECK-NEXT: 1 2 0.50 add z21.b, z10.b, z21.b
-# CHECK-NEXT: 1 2 0.50 add z21.d, p5/m, z21.d, z10.d
-# CHECK-NEXT: 1 2 0.50 add z21.d, z10.d, z21.d
-# CHECK-NEXT: 1 2 0.50 add z21.h, p5/m, z21.h, z10.h
-# CHECK-NEXT: 1 2 0.50 add z21.h, z10.h, z21.h
-# CHECK-NEXT: 1 2 0.50 add z21.s, p5/m, z21.s, z10.s
-# CHECK-NEXT: 1 2 0.50 add z21.s, z10.s, z21.s
-# CHECK-NEXT: 1 2 0.50 add z23.b, p3/m, z23.b, z13.b
-# CHECK-NEXT: 1 2 0.50 add z23.b, z13.b, z8.b
-# CHECK-NEXT: 1 2 0.50 add z23.d, p3/m, z23.d, z13.d
-# CHECK-NEXT: 1 2 0.50 add z23.d, z13.d, z8.d
-# CHECK-NEXT: 1 2 0.50 add z23.h, p3/m, z23.h, z13.h
-# CHECK-NEXT: 1 2 0.50 add z23.h, z13.h, z8.h
-# CHECK-NEXT: 1 2 0.50 add z23.s, p3/m, z23.s, z13.s
-# CHECK-NEXT: 1 2 0.50 add z23.s, z13.s, z8.s
-# CHECK-NEXT: 1 2 0.50 add z31.b, p7/m, z31.b, z31.b
-# CHECK-NEXT: 1 2 0.50 add z31.b, z31.b, #255
-# CHECK-NEXT: 1 2 0.50 add z31.b, z31.b, z31.b
-# CHECK-NEXT: 1 2 0.50 add z31.d, p7/m, z31.d, z31.d
-# CHECK-NEXT: 1 2 0.50 add z31.d, z31.d, #65280
-# CHECK-NEXT: 1 2 0.50 add z31.d, z31.d, z31.d
-# CHECK-NEXT: 1 2 0.50 add z31.h, p7/m, z31.h, z31.h
-# CHECK-NEXT: 1 2 0.50 add z31.h, z31.h, #65280
-# CHECK-NEXT: 1 2 0.50 add z31.h, z31.h, z31.h
-# CHECK-NEXT: 1 2 0.50 add z31.s, p7/m, z31.s, z31.s
-# CHECK-NEXT: 1 2 0.50 add z31.s, z31.s, #65280
-# CHECK-NEXT: 1 2 0.50 add z31.s, z31.s, z31.s
-# CHECK-NEXT: 1 2 1.00 addpl sp, sp, #31
-# CHECK-NEXT: 1 2 1.00 addpl x0, x0, #-32
-# CHECK-NEXT: 1 2 1.00 addpl x21, x21, #0
-# CHECK-NEXT: 1 2 1.00 addpl x23, x8, #-1
-# CHECK-NEXT: 1 2 1.00 addvl sp, sp, #31
-# CHECK-NEXT: 1 2 1.00 addvl x0, x0, #-32
-# CHECK-NEXT: 1 2 1.00 addvl x21, x21, #0
-# CHECK-NEXT: 1 2 1.00 addvl x23, x8, #-1
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, lsl #1]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, lsl #2]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, lsl #3]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, sxtw #1]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, sxtw #2]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, sxtw #3]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, sxtw]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, uxtw #1]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, uxtw #2]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, uxtw #3]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d, uxtw]
-# CHECK-NEXT: 1 2 0.50 adr z0.d, [z0.d, z0.d]
-# CHECK-NEXT: 1 2 0.50 adr z0.s, [z0.s, z0.s, lsl #1]
-# CHECK-NEXT: 1 2 0.50 adr z0.s, [z0.s, z0.s, lsl #2]
-# CHECK-NEXT: 1 2 0.50 adr z0.s, [z0.s, z0.s, lsl #3]
-# CHECK-NEXT: 1 2 0.50 adr z0.s, [z0.s, z0.s]
-# CHECK-NEXT: 1 1 1.00 and p0.b, p0/z, p0.b, p1.b
-# CHECK-NEXT: 1 2 0.50 and z0.d, z0.d, #0x6
-# CHECK-NEXT: 1 2 0.50 and z0.d, z0.d, #0xfffffffffffffff9
-# CHECK-NEXT: 1 2 0.50 and z0.d, z0.d, z0.d
-# CHECK-NEXT: 1 2 0.50 and z0.s, z0.s, #0x6
-# CHECK-NEXT: 1 2 0.50 and z0.s, z0.s, #0xfffffff9
-# CHECK-NEXT: 1 2 0.50 and z23.d, z13.d, z8.d
-# CHECK-NEXT: 1 2 0.50 and z23.h, z23.h, #0x6
-# CHECK-NEXT: 1 2 0.50 and z23.h, z23.h, #0xfff9
-# CHECK-NEXT: 1 2 0.50 and z31.b, p7/m, z31.b, z31.b
-# CHECK-NEXT: 1 2 0.50 and z31.d, p7/m, z31.d, z31.d
-# CHECK-NEXT: 1 2 0.50 and z31.h, p7/m, z31.h, z31.h
-# CHECK-NEXT: 1 2 0.50 and z31.s, p7/m, z31.s, z31.s
-# CHECK-NEXT: 1 2 0.50 and z5.b, z5.b, #0x6
-# CHECK-NEXT: 1 2 0.50 and z5.b, z5.b, #0xf9
-# CHECK-NEXT: 2 2 2.00 ands p0.b, p0/z, p0.b, p1.b
-# CHECK-NEXT: 4 12 2.00 andv b0, p7, z31.b
-# CHECK-NEXT: 4 12 2.00 andv d0, p7, z31.d
-# CHECK-NEXT: 4 12 2.00 andv h0, p7, z31.h
-# CHECK-NEXT: 4 12 2.00 andv s0, p7, z31.s
-# CHECK-NEXT: 1 2 1.00 asr z0.b, p0/m, z0.b, #1
-# CHECK-NEXT: 1 2 1.00 asr z0.b, p0/m, z0.b, z0.b
-# CHECK-NEXT: 1 2 1.00 asr z0.b, p0/m, z0.b, z1.d
-# CHECK-NEXT: 1 2 1.00 asr z0.b, z0.b, #1
-# CHECK-NEXT: 1 2 1.00 asr z0.b, z1.b, z2.d
-# CHECK-NEXT: 1 2 1.00 asr z0.d, p0/m, z0.d, #1
-# CHECK-NEXT: 1 2 1.00 asr z0.d, p0/m, z0.d, z0.d
-# CHECK-NEXT: 1 2 1.00 asr z0.d, z0.d, #1
-# CHECK-NEXT: 1 2 1.00 asr z0.h, p0/m, z0.h, #1
-# CHECK-NEXT: 1 2 1.00 asr z0.h, p0/m, z0.h, z0.h
-# CHECK-NEXT: 1 2 1.00 asr z0.h, p0/m, z0.h, z1.d
-# CHECK-NEXT: 1 2 1.00 asr z0.h, z0.h, #1
-# CHECK-NEXT: 1 2 1.00 asr z0.h, z1.h, z2.d
-# CHECK-NEXT: 1 2 1.00 asr z0.s, p0/m, z0.s, #1
-# CHECK-NEXT: 1 2 1.00 asr z0.s, p0/m, z0.s, z0.s
-# CHECK-NEXT: 1 2 1.00 asr z0.s, p0/m, z0.s, z1.d
-# CHECK-NEXT: 1 2 1.00 asr z0.s, z0.s, #1
-# CHECK-NEXT: 1 2 1.00 asr z0.s, z1.s, z2.d
-# CHECK-NEXT: 1 2 1.00 asr z31.b, p0/m, z31.b, #8
-# CHECK-NEXT: 1 2 1.00 asr z31.b, z31.b, #8
-# CHECK-NEXT: 1 2 1.00 asr z31.d, p0/m, z31.d, #64
-# CHECK-NEXT: 1 2 1.00 asr z31.d, z31.d, #64
-# CHECK-NEXT: 1 2 1.00 asr z31.h, p0/m, z31.h, #16
-# CHECK-NEXT: 1 2 1.00 asr z31.h, z31.h, #16
-# CHECK-NEXT: 1 2 1.00 asr z31.s, p0/m, z31.s, #32
-# CHECK-NEXT: 1 2 1.00 asr z31.s, z31.s, #32
-# CHECK-NEXT: 1 4 1.00 asrd z0.b, p0/m, z0.b, #1
-# CHECK-NEXT: 1 4 1.00 asrd z0.d, p0/m, z0.d, #1
-# CHECK-NEXT: 1 4 1.00 asrd z0.h, p0/m, z0.h, #1
-# CHECK-NEXT: 1 4 1.00 asrd z0.s, p0/m, z0.s, #1
-# CHECK-NEXT: 1 4 1.00 asrd z31.b, p0/m, z31.b, #8
-# CHECK-NEXT: 1 4 1.00 asrd z31.d, p0/m, z31.d, #64
-# CHECK-NEXT: 1 4 1.00 asrd z31.h, p0/m, z31.h, #16
-# CHECK-NEXT: 1 4 1.00 asrd z31.s, p0/m, z31.s, #32
-# CHECK-NEXT: 1 2 1.00 asrr z0.b, p0/m, z0.b, z0.b
-# CHECK-NEXT: 1 2 1.00 asrr z0.d, p0/m, z0.d, z0.d
-# CHECK-NEXT: 1 2 1.00 asrr z0.h, p0/m, z0.h, z0.h
-# CHECK-NEXT: 1 2 1.00 asrr z0.s, p0/m, z0.s, z0.s
-# CHECK-NEXT: 1 4 1.00 bfcvt z0.h, p0/m, z1.s
-# CHECK-NEXT: 1 4 1.00 bfcvtnt z0.h, p0/m, z1.s
-# CHECK-NEXT: 1 4 0.50 bfdot z0.s, z1.h, z2.h
-# CHECK-NEXT: 1 4 0.50 bfdot z0.s, z1.h, z2.h[0]
-# CHECK-NEXT: 1 4 0.50 bfdot z0.s, z1.h, z2.h[3]
-# CHECK-NEXT: 1 5 0.50 bfmlalb z0.s, z1.h, z2.h
-# CHECK-NEXT: 1 5 0.50 bfmlalb z0.s, z1.h, z2.h[0]
-# CHECK-NEXT: 1 5 0.50 bfmlalb z0.s, z1.h, z2.h[7]
-# CHECK-NEXT: 1 5 0.50 bfmlalb z10.s, z21.h, z14.h
-# CHECK-NEXT: 1 5 0.50 bfmlalb z21.s, z14.h, z3.h[2]
-# CHECK-NEXT: 1 5 0.50 bfmlalt z0.s, z1.h, z2.h
-# CHECK-NEXT: 1 5 0.50 bfmlalt z0.s, z1.h, z2.h[0]
-# CHECK-NEXT: 1 5 0.50 bfmlalt z0.s, z1.h, z2.h[7]
-# CHECK-NEXT: 1 5 0.50 bfmlalt z0.s, z1.h, z7.h[7]
-# CHECK-NEXT: 1 5 0.50 bfmlalt z14.s, z10.h, z21.h
-# CHECK-NEXT: 1 5 0.50 bfmmla z0.s, z1.h, z2.h
-# CHECK-NEXT: 1 1 1.00 bic p0.b, p0/z, p0.b, p0.b
-# CHECK-NEXT: 1 1 1.00 bic p15.b, p15/z, p15.b, p15.b
-# CHECK-NEXT: 1 2 0.50 bic z0.d, z0.d, z0.d
-# CHECK-NEXT: 1 2 0.50 bic z23.d, z13.d, z8.d
-# CHECK-NEXT: 1 2 0.50 bic z31.b, p7/m, z31.b, z31.b
-# CHECK-NEXT: 1 2 0.50 bic z31.d, p7/m, z31.d, z31.d
-# CHECK-NEXT: 1 2 0.50 bic z31.h, p7/m, z31.h, z31.h
-# CHECK-NEXT: 1 2 0.50 bic z31.s, p7/m, z31.s, z31.s
-# CHECK-NEXT: 2 2 2.00 bics p0.b, p0/z, p0.b, p0.b
-# CHECK-NEXT: 2 2 2.00 bics p15.b, p15/z, p15.b, p15.b
-# CHECK-NEXT: 1 2 1.00 brka p0.b, p15/m, p15.b
-# CHECK-NEXT: 1 2 1.00 brka p0.b, p15/z, p15.b
-# CHECK-NEXT: 2 3 2.00 brkas p0.b, p15/z, p15.b
-# CHECK-NEXT: 1 2 1.00 brkb p0.b, p15/m, p15.b
-# CHECK-NEXT: 1 2 1.00 brkb p0.b, p15/z, p15.b
-# CHECK-NEXT: 2 3 2.00 brkbs p0.b, p15/z, p15.b
-# CHECK-NEXT: 1 2 1.00 brkn p0.b, p15/z, p1.b, p0.b
-# CHECK-NEXT: 1 2 1.00 brkn p15.b, p15/z, p15.b, p15.b
-# CHECK-NEXT: 2 3 2.00 brkns p0.b, p15/z, p1.b, p0.b
-# CHECK-NEXT: 2 3 2.00 brkns p15.b, p15/z, p15.b, p15.b
-# CHECK-NEXT: 1 2 1.00 brkpa p0.b, p15/z, p1.b, p2.b
-# CHECK-NEXT: 1 2 1.00 brkpa p15.b, p15/z, p15.b, p15.b
-# CHECK-NEXT: 2 3 2.00 brkpas p0.b, p15/z, p1.b, p2.b
-# CHECK-NEXT: 2 3 2.00 brkpas p15.b, p15/z, p15.b, p15.b
-# CHECK-NEXT: 1 2 1.00 brkpb p0.b, p15/z, p1.b, p2.b
-# CHECK-N...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Isn't this what -instruction-tables is for? |
Output only Instructions: like for -instruction-tables option
1a82100
to
dfdb38d
Compare
But does not gives same information... You'd like to change output of -instruction-tables? -scheduling-info output: -instruction-tables output: |
I think Simon has a point. I think that information like "resource names" and "bypass latency" is also relevant to application developers. It would be nicer if your new logic was built on top of the existing instruction tables. At the moment it is designed as a replacement for the instruction info (which personally I don't particularly like). You could optionally allow printing extra fields if you need to. You could also have an option for printing "throughput" values instead of "reciprocal throughput" (if you need to). This is obviously just a suggestion. Not sure what other reviewers think about it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that if we want something in llvm-mca
that is user facing, this should go under -instruction-tables
.
If this is intended to be purely developer facing for validating the scheduling models, I still think that could be better done within/around llvm-exegesis
.
Thank you all for the comments.
Some of additional information are in this patch:
Adding bypass latency. I am going to write these informations only when -instruction-tables is on. |
I agree with you. llvm-exegesis is a better fit for validating and building scheduling models. Speaking about instruction-tables only: I think it doesn't hurt if we add a verbose option for printing extra fields. Some of that information is still very useful to have in general. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also believe that we should piggy-back the new features proposed here on the existing -instruction-tables
flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few minors
Can we merge this MR? Or do you have more remarks ? ;) |
llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-sve-instructions.s
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM cheers!
do you need any help to merge? |
Yes. I don't have permissions to do it for now... ;) |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/185/builds/15586 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/175/builds/15678 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/137/builds/15826 Here is the relevant piece of the build log for the reference
|
Looks like this broke tests, see links above or e.g. http://45.33.8.238/linux/163500/step_11.txt Please take a look and revert for now if it takes a while to fix. |
@mshockwave, it seems that we have to update llvm/test/tools/llvm-mca/RISCV/SiFive7/instruction-tables-tests.s after the update of throughput. |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/26880 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/33/builds/13767 Here is the relevant piece of the build log for the reference
|
@jvillette38 would you mind reverting for now instead of trying to forward fix? this is breaking on several bots, and both the Fuchsia and Chromium toolchain CI, and its usually easier to revert to a green state and reland your patch without needing to rush. Happy to land the revert for you. |
Fixing MR #130574 after merge in main branch. Throughput has been updated in between. Co-authored-by: Julien Villette <julien.villette@sipearl.com>
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/16229 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/56/builds/21853 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/23028 Here is the relevant piece of the build log for the reference
|
Option becomes: -instruction-tables=
<level>
The choice of
<level>
controls number of printed information.<level>
may benone
(default),normal
,full
.Note: If the option is used without
<label>
, default isnormal
(legacy).When
<level>
isfull
, additional information are:<Bypass Latency>
: Latency when a bypass is implemented between operandsin pipelines (see SchedReadAdvance).
<LLVM Opcode Name>
: mnemonic plus operands identifier.<Resources units>
: Used resources associated with LLVM Opcode.<instruction comment>
: reports comment if any from source assembly.Level
full
can be used to better check scheduling info when TableGen is modified.LLVM Opcode name help to find right instruction regexp to fix TableGen Scheduling Info.
-instruction-tables=full option is validated on AArch64/Neoverse/V1-sve-instructions.s
Follow up of MR #126703