[AMDGPU] Support bottom-up postRA scheduing. #135295

harrisonGPU · 2025-04-11T01:40:37Z

Solely relying on top‑down scheduling can underutilize hardware, since long‑latency instructions often end up scheduled too late and their latency isn’t well hidden. Adding bottom‑up post‑RA scheduling lets us move those instructions earlier, which improves latency hiding and yields roughly a 2% performance gain on key benchmarks.

github-actions · 2025-04-11T01:42:51Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvmbot · 2025-04-17T03:31:32Z

@llvm/pr-subscribers-backend-amdgpu

Author: Harrison Hao (harrisonGPU)

Changes

Solely relying on top‑down scheduling can underutilize hardware, since long‑latency instructions often end up scheduled too late and their latency isn’t well hidden. Adding bottom‑up post‑RA scheduling lets us move those instructions earlier, which improves latency hiding and yields roughly a 2% performance gain on key benchmarks.

Full diff: https://github.com/llvm/llvm-project/pull/135295.diff

3 Files Affected:

(modified) llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp (+62-1)
(modified) llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h (+4)
(modified) llvm/test/CodeGen/AMDGPU/sched-barrier-post-RA.mir (+49-1)

diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
index aaefe27b1324f..205cb912126e7 100644
--- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
@@ -284,6 +284,33 @@ void GCNHazardRecognizer::processBundle() {
   CurrCycleInstr = nullptr;
 }
 
+void GCNHazardRecognizer::processBundleBottomUp() {
+  // Step through each instruction in the bundle in bottom-up order.
+  MachineBasicBlock::instr_iterator MI =
+      std::next(CurrCycleInstr->getIterator());
+  MachineBasicBlock::instr_iterator E =
+      CurrCycleInstr->getParent()->instr_end();
+
+  // Evict stale entries to maintain a fixed lookahead window.
+  // TODO: Hazard detection is not yet implemented. This scheduling
+  // is intended for GFX11 and newer.
+  for (; MI != E && MI->isInsideBundle(); ++MI) {
+    CurrCycleInstr = &*MI;
+
+    // Remove up to (MaxLookAhead - 1) oldest entries.
+    for (unsigned I = 0, E = MaxLookAhead - 1; I < E && !EmittedInstrs.empty();
+         ++I)
+      EmittedInstrs.pop_back();
+
+    EmittedInstrs.push_back(CurrCycleInstr);
+
+    // Keep only the most recent MaxLookAhead entries
+    EmittedInstrs.resize(MaxLookAhead);
+  }
+
+  CurrCycleInstr = nullptr;
+}
+
 void GCNHazardRecognizer::runOnInstruction(MachineInstr *MI) {
   assert(IsHazardRecognizerMode);
 
@@ -423,7 +450,41 @@ void GCNHazardRecognizer::AdvanceCycle() {
 }
 
 void GCNHazardRecognizer::RecedeCycle() {
-  llvm_unreachable("hazard recognizer does not support bottom-up scheduling.");
+  // If no instruction was issued this cycle, pop the oldest placeholder.
+  if (!CurrCycleInstr) {
+    if (!EmittedInstrs.empty())
+      EmittedInstrs.pop_back();
+    return;
+  }
+
+  // If this is a bundle header, handle the entire bundle here.
+  if (CurrCycleInstr->isBundle()) {
+    processBundleBottomUp();
+    return;
+  }
+
+  unsigned NumWaitStates = TII.getNumWaitStates(*CurrCycleInstr);
+  if (!NumWaitStates) {
+    CurrCycleInstr = nullptr;
+    return;
+  }
+
+  // Add current instruction to the emitted list.
+  EmittedInstrs.push_back(CurrCycleInstr);
+
+  // Model remaining wait states by removing older placeholders.
+  for (unsigned I = 1, E = std::min(NumWaitStates, getMaxLookAhead()); I < E;
+       ++I) {
+    if (!EmittedInstrs.empty())
+      EmittedInstrs.pop_back();
+  }
+
+  // getMaxLookahead() is the largest number of wait states we will ever need
+  // to insert, so there is no point in keeping track of more than that many
+  // wait states.
+  EmittedInstrs.resize(getMaxLookAhead());
+
+  CurrCycleInstr = nullptr;
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
index bbc55851bf967..88c7426be552d 100644
--- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
+++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h
@@ -69,6 +69,10 @@ class GCNHazardRecognizer final : public ScheduleHazardRecognizer {
   // Advance over a MachineInstr bundle. Look for hazards in the bundled
   // instructions.
   void processBundle();
+  // Recede over a MachineInstr bundle. Adds bundled instructions to the
+  // EmittedInstrs queue in bottom-up scheduling mode.
+  // TODO: Hazard detection is not yet implemented.
+  void processBundleBottomUp();
 
   // Run on an individual instruction in hazard recognizer mode. This can be
   // used on a newly inserted instruction before returning from PreEmitNoops.
diff --git a/llvm/test/CodeGen/AMDGPU/sched-barrier-post-RA.mir b/llvm/test/CodeGen/AMDGPU/sched-barrier-post-RA.mir
index 7bdb8f5b35ec5..02ebffca84bda 100644
--- a/llvm/test/CodeGen/AMDGPU/sched-barrier-post-RA.mir
+++ b/llvm/test/CodeGen/AMDGPU/sched-barrier-post-RA.mir
@@ -1,5 +1,6 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
-# RUN: llc -mtriple=amdgcn -mcpu=gfx908 -misched-cluster=false -run-pass=postmisched -verify-misched -o - %s | FileCheck %s
+# RUN: llc -mtriple=amdgcn -mcpu=gfx908 -misched-cluster=false -run-pass=postmisched -verify-misched -o - %s | FileCheck -check-prefix=CHECK %s
+# RUN: llc -mtriple=amdgcn -mcpu=gfx908 -misched-cluster=false -run-pass=postmisched -misched-postra-direction=bottomup -verify-misched -o - %s | FileCheck -check-prefix=CHECK-BOTTOMUP %s
 
 --- |
   define amdgpu_kernel void @no_sched_barrier(ptr addrspace(1) noalias %out, ptr addrspace(1) noalias %in) { ret void }
@@ -29,6 +30,21 @@ body: |
     ; CHECK-NEXT:   GLOBAL_STORE_DWORD_SADDR killed renamable $vgpr0, killed renamable $vgpr2, killed renamable $sgpr0_sgpr1, 512, 0, implicit $exec :: (store (s32) into %ir.out, !noalias !0, addrspace 1)
     ; CHECK-NEXT: }
     ; CHECK-NEXT: S_ENDPGM 0
+    ;
+    ; CHECK-BOTTOMUP-LABEL: name: no_sched_barrier
+    ; CHECK-BOTTOMUP: renamable $vgpr0 = IMPLICIT_DEF
+    ; CHECK-BOTTOMUP-NEXT: renamable $sgpr0_sgpr1 = IMPLICIT_DEF
+    ; CHECK-BOTTOMUP-NEXT: BUNDLE implicit-def $vgpr1, implicit-def $vgpr1_lo16, implicit-def $vgpr1_hi16, implicit-def $vgpr2, implicit-def $vgpr2_lo16, implicit-def $vgpr2_hi16, implicit $sgpr0_sgpr1, implicit $vgpr0, implicit $exec {
+    ; CHECK-BOTTOMUP-NEXT:   renamable $vgpr1 = GLOBAL_LOAD_DWORD_SADDR renamable $sgpr0_sgpr1, renamable $vgpr0, 0, 0, implicit $exec :: (load (s32) from %ir.in, !alias.scope !0, addrspace 1)
+    ; CHECK-BOTTOMUP-NEXT:   renamable $vgpr2 = GLOBAL_LOAD_DWORD_SADDR renamable $sgpr0_sgpr1, renamable $vgpr0, 512, 0, implicit $exec :: (load (s32) from %ir.in, !alias.scope !0, addrspace 1)
+    ; CHECK-BOTTOMUP-NEXT: }
+    ; CHECK-BOTTOMUP-NEXT: renamable $vgpr1 = nsw V_MUL_LO_U32_e64 killed $vgpr1, $vgpr1, implicit $exec
+    ; CHECK-BOTTOMUP-NEXT: renamable $vgpr2 = nsw V_MUL_LO_U32_e64 killed $vgpr2, $vgpr2, implicit $exec
+    ; CHECK-BOTTOMUP-NEXT: BUNDLE implicit killed $vgpr0, implicit killed $vgpr1, implicit killed $sgpr0_sgpr1, implicit $exec, implicit killed $vgpr2 {
+    ; CHECK-BOTTOMUP-NEXT:   GLOBAL_STORE_DWORD_SADDR renamable $vgpr0, killed renamable $vgpr1, renamable $sgpr0_sgpr1, 0, 0, implicit $exec :: (store (s32) into %ir.out, !noalias !0, addrspace 1)
+    ; CHECK-BOTTOMUP-NEXT:   GLOBAL_STORE_DWORD_SADDR killed renamable $vgpr0, killed renamable $vgpr2, killed renamable $sgpr0_sgpr1, 512, 0, implicit $exec :: (store (s32) into %ir.out, !noalias !0, addrspace 1)
+    ; CHECK-BOTTOMUP-NEXT: }
+    ; CHECK-BOTTOMUP-NEXT: S_ENDPGM 0
     renamable $sgpr0_sgpr1 = IMPLICIT_DEF
     renamable $vgpr0 = IMPLICIT_DEF
     BUNDLE implicit-def $vgpr1, implicit-def $vgpr1_lo16, implicit-def $vgpr1_hi16, implicit-def $vgpr2, implicit-def $vgpr2_lo16, implicit-def $vgpr2_hi16, implicit $sgpr0_sgpr1, implicit $vgpr0, implicit $exec {
@@ -66,6 +82,22 @@ body: |
     ; CHECK-NEXT:   GLOBAL_STORE_DWORD_SADDR killed renamable $vgpr0, killed renamable $vgpr2, killed renamable $sgpr0_sgpr1, 512, 0, implicit $exec :: (store (s32) into %ir.out, !noalias !0, addrspace 1)
     ; CHECK-NEXT: }
     ; CHECK-NEXT: S_ENDPGM 0
+    ;
+    ; CHECK-BOTTOMUP-LABEL: name: sched_barrier_mask_0
+    ; CHECK-BOTTOMUP: renamable $vgpr0 = IMPLICIT_DEF
+    ; CHECK-BOTTOMUP-NEXT: renamable $sgpr0_sgpr1 = IMPLICIT_DEF
+    ; CHECK-BOTTOMUP-NEXT: BUNDLE implicit-def $vgpr1, implicit-def $vgpr1_lo16, implicit-def $vgpr1_hi16, implicit-def $vgpr2, implicit-def $vgpr2_lo16, implicit-def $vgpr2_hi16, implicit $sgpr0_sgpr1, implicit $vgpr0, implicit $exec {
+    ; CHECK-BOTTOMUP-NEXT:   renamable $vgpr1 = GLOBAL_LOAD_DWORD_SADDR renamable $sgpr0_sgpr1, renamable $vgpr0, 0, 0, implicit $exec :: (load (s32) from %ir.in, !alias.scope !0, addrspace 1)
+    ; CHECK-BOTTOMUP-NEXT:   renamable $vgpr2 = GLOBAL_LOAD_DWORD_SADDR renamable $sgpr0_sgpr1, renamable $vgpr0, 512, 0, implicit $exec :: (load (s32) from %ir.in, !alias.scope !0, addrspace 1)
+    ; CHECK-BOTTOMUP-NEXT: }
+    ; CHECK-BOTTOMUP-NEXT: renamable $vgpr1 = nsw V_MUL_LO_U32_e64 killed $vgpr1, $vgpr1, implicit $exec
+    ; CHECK-BOTTOMUP-NEXT: SCHED_BARRIER 0
+    ; CHECK-BOTTOMUP-NEXT: renamable $vgpr2 = nsw V_MUL_LO_U32_e64 killed $vgpr2, $vgpr2, implicit $exec
+    ; CHECK-BOTTOMUP-NEXT: BUNDLE implicit killed $vgpr0, implicit killed $vgpr1, implicit killed $sgpr0_sgpr1, implicit $exec, implicit killed $vgpr2 {
+    ; CHECK-BOTTOMUP-NEXT:   GLOBAL_STORE_DWORD_SADDR renamable $vgpr0, killed renamable $vgpr1, renamable $sgpr0_sgpr1, 0, 0, implicit $exec :: (store (s32) into %ir.out, !noalias !0, addrspace 1)
+    ; CHECK-BOTTOMUP-NEXT:   GLOBAL_STORE_DWORD_SADDR killed renamable $vgpr0, killed renamable $vgpr2, killed renamable $sgpr0_sgpr1, 512, 0, implicit $exec :: (store (s32) into %ir.out, !noalias !0, addrspace 1)
+    ; CHECK-BOTTOMUP-NEXT: }
+    ; CHECK-BOTTOMUP-NEXT: S_ENDPGM 0
     renamable $sgpr0_sgpr1 = IMPLICIT_DEF
     renamable $vgpr0 = IMPLICIT_DEF
     BUNDLE implicit-def $vgpr1, implicit-def $vgpr1_lo16, implicit-def $vgpr1_hi16, implicit-def $vgpr2, implicit-def $vgpr2_lo16, implicit-def $vgpr2_hi16, implicit $sgpr0_sgpr1, implicit $vgpr0, implicit $exec {
@@ -105,6 +137,22 @@ body: |
     ; CHECK-NEXT:   GLOBAL_STORE_DWORD_SADDR killed renamable $vgpr0, killed renamable $vgpr2, killed renamable $sgpr0_sgpr1, 512, 0, implicit $exec :: (store (s32) into %ir.out, !noalias !0, addrspace 1)
     ; CHECK-NEXT: }
     ; CHECK-NEXT: S_ENDPGM 0
+    ;
+    ; CHECK-BOTTOMUP-LABEL: name: sched_barrier_mask_1
+    ; CHECK-BOTTOMUP: renamable $vgpr0 = IMPLICIT_DEF
+    ; CHECK-BOTTOMUP-NEXT: renamable $sgpr0_sgpr1 = IMPLICIT_DEF
+    ; CHECK-BOTTOMUP-NEXT: BUNDLE implicit-def $vgpr1, implicit-def $vgpr1_lo16, implicit-def $vgpr1_hi16, implicit-def $vgpr2, implicit-def $vgpr2_lo16, implicit-def $vgpr2_hi16, implicit $sgpr0_sgpr1, implicit $vgpr0, implicit $exec {
+    ; CHECK-BOTTOMUP-NEXT:   renamable $vgpr1 = GLOBAL_LOAD_DWORD_SADDR renamable $sgpr0_sgpr1, renamable $vgpr0, 0, 0, implicit $exec :: (load (s32) from %ir.in, !alias.scope !0, addrspace 1)
+    ; CHECK-BOTTOMUP-NEXT:   renamable $vgpr2 = GLOBAL_LOAD_DWORD_SADDR renamable $sgpr0_sgpr1, renamable $vgpr0, 512, 0, implicit $exec :: (load (s32) from %ir.in, !alias.scope !0, addrspace 1)
+    ; CHECK-BOTTOMUP-NEXT: }
+    ; CHECK-BOTTOMUP-NEXT: renamable $vgpr1 = nsw V_MUL_LO_U32_e64 killed $vgpr1, $vgpr1, implicit $exec
+    ; CHECK-BOTTOMUP-NEXT: renamable $vgpr2 = nsw V_MUL_LO_U32_e64 killed $vgpr2, $vgpr2, implicit $exec
+    ; CHECK-BOTTOMUP-NEXT: SCHED_BARRIER 1
+    ; CHECK-BOTTOMUP-NEXT: BUNDLE implicit killed $vgpr0, implicit killed $vgpr1, implicit killed $sgpr0_sgpr1, implicit $exec, implicit killed $vgpr2 {
+    ; CHECK-BOTTOMUP-NEXT:   GLOBAL_STORE_DWORD_SADDR renamable $vgpr0, killed renamable $vgpr1, renamable $sgpr0_sgpr1, 0, 0, implicit $exec :: (store (s32) into %ir.out, !noalias !0, addrspace 1)
+    ; CHECK-BOTTOMUP-NEXT:   GLOBAL_STORE_DWORD_SADDR killed renamable $vgpr0, killed renamable $vgpr2, killed renamable $sgpr0_sgpr1, 512, 0, implicit $exec :: (store (s32) into %ir.out, !noalias !0, addrspace 1)
+    ; CHECK-BOTTOMUP-NEXT: }
+    ; CHECK-BOTTOMUP-NEXT: S_ENDPGM 0
     renamable $sgpr0_sgpr1 = IMPLICIT_DEF
     renamable $vgpr0 = IMPLICIT_DEF
     BUNDLE implicit-def $vgpr1, implicit-def $vgpr1_lo16, implicit-def $vgpr1_hi16, implicit-def $vgpr2, implicit-def $vgpr2_lo16, implicit-def $vgpr2_hi16, implicit $sgpr0_sgpr1, implicit $vgpr0, implicit $exec {

llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h

llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp

perlfu

I don't think this change is actually doing what you expect.
Because none of the additional code is having any impact if hazard recognition using EmittedInstrs is never used.
If the goal is to allow enabling of bottom up scheduling post-RA on GFX11+ then the only required change is

void GCNHazardRecognizer::RecedeCycle() {
  if (IsHazardRecognizerMode || ST.getGeneration() < GFX11)
    llvm_unreachable("hazard recognizer does not support bottom-up scheduling.");
 }

I expect if you do only the above you'll find the generated ISA is exactly the same as this change.

To support architectures that rely on the EmittedInstrs vector then its state needs to be properly maintained and tests covering targets which rely on its state need to be provided.

perlfu · 2025-05-27T04:46:43Z

llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp

+  }
+
+  // Add current instruction to the emitted list.
+  EmittedInstrs.push_back(CurrCycleInstr);


This sequence of code doesn't make sense.

It pushes CurrCycleInstr onto the end of EmittedInstrs

Removes values from the end of EmittedInstrs. This will start by disposing of CurrCycleInstr, completely negating the previous operation.

Resizes EmittedInstrs to getMaxLookAhead(). In most cases EmittedInstrs will be 0 or 1 elements in size by this point. Hence this will expand it with empty elements.

Thanks a lot—I really appreciate the feedback! :-)
I understand your concerns and agree with your point of view.
I was comparing it with AdvanceCycle() and misunderstood the difference, but I’ve now updated the patch: bottom‑up scheduling on GFX11+ skips the hazard recognizer, so maintaining EmittedInstrs is no longer necessary.
Thanks again for pointing this out !!

llvm/test/CodeGen/AMDGPU/sched-barrier-post-RA.mir

arsenm · 2025-05-27T13:44:04Z

llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp

+    llvm_unreachable(
+        "hazard recognizer does not support bottom-up scheduling.");


This shouldn't be unreachable, it's a fatal error

Thanks, I updated it.

jayfoad · 2025-06-03T11:00:39Z

llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp

@@ -417,7 +417,9 @@ void GCNHazardRecognizer::AdvanceCycle() {
 }

 void GCNHazardRecognizer::RecedeCycle() {
-  llvm_unreachable("hazard recognizer does not support bottom-up scheduling.");
+  if (IsHazardRecognizerMode || ST.getGeneration() < AMDGPUSubtarget::GFX11)


This should still be assert(!IsHazardRecognizerMode) because the standalone hazard recognizer always runs top-down, right?

Yes, hazard recognizer always runs top-down, I have updated it, what do you think about it?

jayfoad · 2025-06-03T13:35:13Z

llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp

+  if (ST.getGeneration() < AMDGPUSubtarget::GFX11)
+    report_fatal_error("Hazard recognizer does not support bottom-up "
+                       "scheduling on pre‑GFX11.");


I'm not sure whether this should be an error or not. The compiler will still generate correct code, even on GFX10 or earlier, right? The only problem is that the post-RA scheduler will not be aware of some hazards, so there will be more work for the standalone hazard recognizer pass to do.

Yes, it generates correct code on GFX10, but I think we should enable it first, because I believe there isn’t a single approach that can handle all instruction scheduling cases.

I thought the standalone hazard recognizer was only used at -O0, and the hazard recognition during the post-RA scheduler was exclusively used. i.e. only one or the other run, never both

I believe hazard recognition does not affect bottom-up scheduling, because the Post-RA Hazard Recognizer pass runs after the Post-RA Machine Instruction Scheduler.
It sets IsHazardRecognizerMode to true only when PreEmitNoops is invoked, as shown here:

llvm-project/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp

Lines 297 to 304 in adbbb90

unsigned GCNHazardRecognizer::PreEmitNoops(MachineInstr *MI) {

IsHazardRecognizerMode = true;

CurrCycleInstr = MI;

unsigned W = PreEmitNoopsCommon(MI);

fixHazards(MI);

CurrCycleInstr = nullptr;

return W;

}

Furthermore, this hazard recognizer only uses AdvanceCycle, not RecedeCycle:

llvm-project/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp

Lines 283 to 294 in adbbb90

void GCNHazardRecognizer::runOnInstruction(MachineInstr *MI) {

assert(IsHazardRecognizerMode);

unsigned NumPreNoops = PreEmitNoops(MI);

EmitNoops(NumPreNoops);

if (MI->isInsideBundle())

insertNoopsInBundle(MI, TII, NumPreNoops);

else

TII.insertNoops(*MI->getParent(), MachineBasicBlock::iterator(MI),

NumPreNoops);

EmitInstruction(MI);

AdvanceCycle();

I do believe this feature is worth investigating further, as it could help hide long-latency instructions and potentially improve GPU performance.

Do you think I should add an assert or report a fatal error instead? I’d appreciate any suggestions you might have.

I am trying to understand why llc -mcpu=gfx1010 -misched-postra-direction=bottomup should be an error. The code will still work correctly, right?

Hi, Jay, I understand your mind, I think only add a assert is enough to support this feature, and I have updated it.

perlfu · 2025-06-04T05:17:46Z

llvm/test/CodeGen/AMDGPU/sched-image-sample-post-RA.mir

@@ -0,0 +1,158 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5


What does this test show?
If this is showing an improvement with bottomup and you make it more obvious?
Or at least provide a comment in the test what it is testing.

Thanks, I have updated comments. What do you think about it?

jayfoad

LGTM

jayfoad · 2025-06-05T12:16:19Z

llvm/test/CodeGen/AMDGPU/sched-image-sample-post-RA.mir

+# RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -run-pass=postmisched -verify-misched -o - %s | FileCheck -check-prefix=TopDown %s
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -run-pass=postmisched -misched-postra-direction=bottomup -verify-misched -o - %s | FileCheck -check-prefix=BottomUp %s


Nit: it is more common for check prefixes to be all capitals:

Suggested change

# RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -run-pass=postmisched -verify-misched -o - %s | FileCheck -check-prefix=TopDown %s

# RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -run-pass=postmisched -misched-postra-direction=bottomup -verify-misched -o - %s | FileCheck -check-prefix=BottomUp %s

# RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -run-pass=postmisched -verify-misched -o - %s | FileCheck -check-prefix=TOPDOWN %s

# RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -run-pass=postmisched -misched-postra-direction=bottomup -verify-misched -o - %s | FileCheck -check-prefix=BOTTOMUP %s

Thanks, I have updated it.

harrisonGPU requested a review from arsenm April 11, 2025 01:40

harrisonGPU force-pushed the amdgpu-postRA branch from 93b6595 to 8574e1f Compare April 11, 2025 03:01

harrisonGPU requested review from nhaehnle, jayfoad and ruiling April 11, 2025 04:17

dstutt requested a review from perlfu April 16, 2025 08:32

harrisonGPU force-pushed the amdgpu-postRA branch from 8574e1f to 84e6845 Compare April 17, 2025 03:11

harrisonGPU marked this pull request as ready for review April 17, 2025 03:30

llvmbot added the backend:AMDGPU label Apr 17, 2025

arsenm reviewed Apr 17, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/GCNHazardRecognizer.h Outdated Show resolved Hide resolved

arsenm reviewed Apr 17, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp Outdated Show resolved Hide resolved

harrisonGPU requested review from arsenm and shiltian April 25, 2025 09:18

harrisonGPU force-pushed the amdgpu-postRA branch from 4323b2b to 33c5df4 Compare May 21, 2025 09:16

perlfu requested changes May 27, 2025

View reviewed changes

harrisonGPU force-pushed the amdgpu-postRA branch 3 times, most recently from ee44249 to ebeedaf Compare May 27, 2025 09:37

harrisonGPU requested a review from perlfu May 27, 2025 09:40

arsenm reviewed May 27, 2025

View reviewed changes

harrisonGPU requested a review from arsenm May 27, 2025 15:36

harrisonGPU force-pushed the amdgpu-postRA branch from fae1c91 to 9471187 Compare June 3, 2025 09:51

jayfoad reviewed Jun 3, 2025

View reviewed changes

perlfu reviewed Jun 4, 2025

View reviewed changes

harrisonGPU requested review from perlfu and jayfoad June 4, 2025 05:32

[AMDGPU] Support bottom-up postRA scheduling.

afa551c

harrisonGPU added 8 commits June 5, 2025 14:53

[AMDGPU] Update comments.

188e02f

[AMDGPU] Add assert for Hazard for bottom up.

1da72c7

[AMDGPU] Add a new lit test.

6061486

[AMDGPU] Rmove unnecssary RecedeCycle.

87662ff

[AMDGPU] Use fatal error.

de7f707

[AMDGPU] Add a assert.

4dd3e65

[AMDGPU] Update lit test comment.

1d2021a

[AMDGPU] Only add assert.

19f42d0

harrisonGPU force-pushed the amdgpu-postRA branch from 19e9b1b to 19f42d0 Compare June 5, 2025 11:31

jayfoad approved these changes Jun 5, 2025

View reviewed changes

harrisonGPU added 2 commits June 5, 2025 20:25

[AMDGPU] Update lit test.

a85bd5c

[AMDGPU] Update lit test again.

41281fc

harrisonGPU merged commit b2379bd into llvm:main Jun 5, 2025
11 checks passed

harrisonGPU deleted the amdgpu-postRA branch June 5, 2025 14:07

		llvm_unreachable(
		"hazard recognizer does not support bottom-up scheduling.");

	unsigned GCNHazardRecognizer::PreEmitNoops(MachineInstr *MI) {
	IsHazardRecognizerMode = true;
	CurrCycleInstr = MI;
	unsigned W = PreEmitNoopsCommon(MI);
	fixHazards(MI);
	CurrCycleInstr = nullptr;
	return W;
	}

	void GCNHazardRecognizer::runOnInstruction(MachineInstr *MI) {
	assert(IsHazardRecognizerMode);

	unsigned NumPreNoops = PreEmitNoops(MI);
	EmitNoops(NumPreNoops);
	if (MI->isInsideBundle())
	insertNoopsInBundle(MI, TII, NumPreNoops);
	else
	TII.insertNoops(*MI->getParent(), MachineBasicBlock::iterator(MI),
	NumPreNoops);
	EmitInstruction(MI);
	AdvanceCycle();

		@@ -0,0 +1,158 @@
		# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5

		# RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -run-pass=postmisched -verify-misched -o - %s \| FileCheck -check-prefix=TopDown %s
		# RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -run-pass=postmisched -misched-postra-direction=bottomup -verify-misched -o - %s \| FileCheck -check-prefix=BottomUp %s

[AMDGPU] Support bottom-up postRA scheduing. #135295

[AMDGPU] Support bottom-up postRA scheduing. #135295

Uh oh!

Conversation

harrisonGPU commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Apr 17, 2025

Uh oh!

Uh oh!

Uh oh!

perlfu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jayfoad left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

harrisonGPU commented Apr 11, 2025 •

edited

Loading

github-actions bot commented Apr 11, 2025 •

edited

Loading