[AMDGPU] Promote nestedGEP allocas to vectors #141199

harrisonGPU · 2025-05-23T06:01:04Z

Supports the nestedGEPpattern that
appears when an alloca is first indexed as an array element and then
shifted with a byte‑offset GEP:

  %SortedFragments = alloca [10 x <2 x i32>], addrspace(5), align 8
  %row  = getelementptr [10 x <2 x i32>], ptr addrspace(5) %SortedFragments, i32 0, i32 %j
  %elt1 = getelementptr i8, ptr addrspace(5) %row, i32 4
  %val  = load i32, ptr addrspace(5) %elt1

The pass folds the two levels of addressing into a single vector lane
index and keeps the whole object in a VGPR:

  %vec  = freeze <20 x i32> poison              ; alloca promote  <20 x i32>
  %idx0 = mul i32 %j, 2                         ; j * 2
  %idx  = add i32 %idx0, 1                      ; j * 2 + 1
  %val  = extractelement <20 x i32> %vec, i32 %idx

This eliminates the scratch read.

llvmbot · 2025-05-23T06:01:36Z

@llvm/pr-subscribers-backend-amdgpu

Author: Harrison Hao (harrisonGPU)

Changes

Supports the GEP‑of‑GEP pattern that
appears when an alloca is first indexed as an array element and then
shifted with a byte‑offset GEP:

  %SortedFragments = alloca [10 x &lt;2 x i32&gt;], addrspace(5), align 8
  %row  = getelementptr [10 x &lt;2 x i32&gt;], ptr addrspace(5) %SortedFragments, i32 0, i32 %j
  %elt1 = getelementptr i8, ptr addrspace(5) %row, i32 4
  %val  = load i32, ptr addrspace(5) %elt1

The pass folds the two levels of addressing into a single vector lane
index and keeps the whole object in a VGPR:

  %vec  = freeze &lt;20 x i32&gt; poison              ; alloca promote  &lt;20 x i32&gt;
  %idx0 = mul i32 %j, 2                         ; j * 2
  %idx  = add i32 %idx0, 1                      ; j * 2 + 1
  %val  = extractelement &lt;20 x i32&gt; %vec, i32 %idx

This eliminates the scratch read and leaves no LDS traffic.

Full diff: https://github.com/llvm/llvm-project/pull/141199.diff

3 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp (+56-2)
(modified) llvm/test/CodeGen/AMDGPU/amdpal.ll (+7-12)
(added) llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep-of-gep.ll (+55)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
index 517d05f6514d5..4355f195f2c88 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
@@ -437,8 +437,62 @@ static Value *GEPToVectorIndex(GetElementPtrInst *GEP, AllocaInst *Alloca,
   unsigned BW = DL.getIndexTypeSizeInBits(GEP->getType());
   SmallMapVector<Value *, APInt, 4> VarOffsets;
   APInt ConstOffset(BW, 0);
-  if (GEP->getPointerOperand()->stripPointerCasts() != Alloca ||
-      !GEP->collectOffset(DL, BW, VarOffsets, ConstOffset))
+
+  // Walk backwards through nested GEPs to collect both constant and variable
+  // offsets, so that nested vector GEP chains can be lowered in one step.
+  //
+  // Given this IR fragment as input:
+  //
+  //   %0 = alloca [10 x <2 x i32>], align 8, addrspace(5)
+  //   %1 = getelementptr [10 x <2 x i32>], ptr addrspace(5) %0, i32 0, i32 %j
+  //   %2 = getelementptr i8, ptr addrspace(5) %1, i32 4
+  //   %3 = load  i32, ptr addrspace(5) %2, align 4
+  //
+  // Combine both GEP operations in a single pass, producing:
+  //   BasePtr      = %0
+  //   ConstOffset  = 4
+  //   VarOffsets   = { %j → element_size(<2 x i32>) }
+  //
+  // That lets us emit a single buffer_load directly into a VGPR, without ever
+  // allocating scratch memory for the intermediate pointer.
+  Value *CurPtr = GEP;
+  while (auto *CurGEP = dyn_cast<GetElementPtrInst>(CurPtr)) {
+    SmallMapVector<Value *, APInt, 4> LocalVarsOffsets;
+    APInt LocalConstOffset(BW, 0);
+
+    if (!CurGEP->collectOffset(DL, BW, LocalVarsOffsets, LocalConstOffset))
+      return nullptr;
+
+    // Merge any variable-index contributions into the accumulated VarOffsets
+    // map.
+    // Only a single pointer variable is allowed in the entire GEP chain.
+    // If VarOffsets already holds a different pointer, abort.
+    //
+    // Example:
+    //   Suppose LocalVarsOffsets = { (%ptr → 4) } from this GEP, and
+    //   VarOffsets already has { (%ptr → 8) } from an inner GEP.
+    //   After this loop, VarOffsets should become { (%ptr → 12) }.
+    for (auto &VarEntry : LocalVarsOffsets) {
+      // If VarOffsets already records a different pointer, abort.
+      if (!VarOffsets.empty() && !VarOffsets.count(VarEntry.first))
+        return nullptr;
+
+      // Look up whether we’ve seen this pointer before.
+      auto *Existing = VarOffsets.find(VarEntry.first);
+      if (Existing == VarOffsets.end())
+        VarOffsets.insert({VarEntry.first, VarEntry.second});
+      else
+        Existing->second += VarEntry.second;
+    }
+
+    ConstOffset += LocalConstOffset;
+
+    // Move to the next outer pointer
+    CurPtr = CurGEP->getPointerOperand()->stripPointerCasts();
+  }
+
+  // Only proceed if this GEP stems from the same alloca.
+  if (CurPtr->stripPointerCasts() != Alloca)
     return nullptr;
 
   unsigned VecElemSize = DL.getTypeAllocSize(VecElemTy);
diff --git a/llvm/test/CodeGen/AMDGPU/amdpal.ll b/llvm/test/CodeGen/AMDGPU/amdpal.ll
index 2e47b0163aa8c..695e80a7cc0a6 100644
--- a/llvm/test/CodeGen/AMDGPU/amdpal.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdpal.ll
@@ -1,5 +1,5 @@
-; RUN: llc < %s -mtriple=amdgcn--amdpal -mcpu=tahiti | FileCheck --check-prefixes=PAL,CI --enable-var-scope %s
-; RUN: llc < %s -mtriple=amdgcn--amdpal -mcpu=tonga | FileCheck --check-prefixes=PAL,VI --enable-var-scope %s
+; RUN: llc < %s -mtriple=amdgcn--amdpal -mcpu=tahiti | FileCheck --check-prefixes=PAL --enable-var-scope %s
+; RUN: llc < %s -mtriple=amdgcn--amdpal -mcpu=tonga | FileCheck --check-prefixes=PAL --enable-var-scope %s
 
 ; PAL-NOT: .AMDGPU.config
 ; PAL-LABEL: {{^}}simple:
@@ -51,17 +51,12 @@ entry:
   ret void
 }
 
-; Check code sequence for amdpal use of scratch for alloca in a compute shader.
-; The scratch descriptor is loaded from offset 0x10 of the GIT, rather than offset
-; 0 in a graphics shader.
-; Prior to GCN3 s_load_dword offsets are dwords, so the offset will be 0x4.
+; After the change that **promotes the alloca to a vector** (GEP‑of‑GEP
+; promotion), no scratch buffer is needed, so the descriptor load should
+; disappear.
 
 ; PAL-LABEL: {{^}}scratch2_cs:
-; PAL: s_movk_i32 s{{[0-9]+}}, 0x1234
-; PAL: s_mov_b32 s[[GITPTR:[0-9]+]], s0
-; CI: s_load_dwordx4 s[[[SCRATCHDESC:[0-9]+]]:{{[0-9]+]}}, s[[[GITPTR]]:{{[0-9]+\]}}, 0x4
-; VI: s_load_dwordx4 s[[[SCRATCHDESC:[0-9]+]]:{{[0-9]+]}}, s[[[GITPTR]]:{{[0-9]+\]}}, 0x10
-; PAL: buffer_store{{.*}}, s[[[SCRATCHDESC]]:
+; PAL: buffer_store{{.*}}, s[[[SCRATCHDESC:[0-9]+]]:{{[0-9]+]}}
 
 define amdgpu_cs void @scratch2_cs(i32 inreg, i32 inreg, i32 inreg, <3 x i32> inreg, i32 inreg, <3 x i32> %coord, <2 x i32> %in, i32 %extra, i32 %idx) #0 {
 entry:
@@ -88,6 +83,6 @@ declare void @llvm.amdgcn.raw.ptr.buffer.store.f32(float, ptr addrspace(8), i32,
 ; PAL-NEXT:       .cs:
 ; PAL-NEXT:         .entry_point:    _amdgpu_cs_main
 ; PAL-NEXT:         .entry_point_symbol:    scratch2_cs
-; PAL-NEXT:         .scratch_memory_size: 0x10
+; PAL-NEXT:         .scratch_memory_size: 0
 ; PAL-NEXT:         .sgpr_count:     0x
 ; PAL-NEXT:         .vgpr_count:     0x
diff --git a/llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep-of-gep.ll b/llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep-of-gep.ll
new file mode 100644
index 0000000000000..40a81d04d09c8
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep-of-gep.ll
@@ -0,0 +1,55 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -passes=amdgpu-promote-alloca < %s | FileCheck %s
+target triple = "amdgcn-amd-amdhsa"
+define amdgpu_ps void @scalar_alloca_ptr_with_vector_gep_of_gep(i32 %j) #0 {
+; CHECK-LABEL: define amdgpu_ps void @scalar_alloca_ptr_with_vector_gep_of_gep(
+; CHECK-SAME: i32 [[J:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[SORTEDFRAGMENTS:%.*]] = freeze <20 x i32> poison
+; CHECK-NEXT:    [[TMP0:%.*]] = mul i32 [[J]], 2
+; CHECK-NEXT:    [[TMP1:%.*]] = mul i32 [[J]], 2
+; CHECK-NEXT:    [[TMP2:%.*]] = add i32 1, [[TMP1]]
+; CHECK-NEXT:    [[TMP3:%.*]] = extractelement <20 x i32> [[SORTEDFRAGMENTS]], i32 [[TMP2]]
+; CHECK-NEXT:    ret void
+;
+entry:
+  %SortedFragments = alloca [10 x <2 x i32>], align 8, addrspace(5)
+  %0 = getelementptr [10 x <2 x i32>], ptr addrspace(5) %SortedFragments, i32 0, i32 %j
+  %1 = getelementptr i8, ptr addrspace(5) %0, i32 4
+  %2 = load i32, ptr addrspace(5) %1, align 4
+  ret void
+}
+
+attributes #0 = { "amdgpu-promote-alloca-to-vector-max-regs"="32" }
+
+define amdgpu_cs void @scalar_alloca_ptr_with_vector_gep_of_scratch(i32 inreg, i32 inreg, i32 inreg, <3 x i32> inreg, i32 inreg, <3 x i32> %coord, <2 x i32> %in, i32 %extra, i32 %idx) #1 {
+; CHECK-LABEL: define amdgpu_cs void @scalar_alloca_ptr_with_vector_gep_of_scratch(
+; CHECK-SAME: i32 inreg [[TMP0:%.*]], i32 inreg [[TMP1:%.*]], i32 inreg [[TMP2:%.*]], <3 x i32> inreg [[TMP3:%.*]], i32 inreg [[TMP4:%.*]], <3 x i32> [[COORD:%.*]], <2 x i32> [[IN:%.*]], i32 [[EXTRA:%.*]], i32 [[IDX:%.*]]) #[[ATTR1:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[V:%.*]] = freeze <3 x i32> poison
+; CHECK-NEXT:    [[TMP5:%.*]] = insertelement <3 x i32> [[V]], i32 [[EXTRA]], i32 0
+; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <2 x i32> [[IN]], i64 0
+; CHECK-NEXT:    [[TMP7:%.*]] = insertelement <3 x i32> [[TMP5]], i32 [[TMP6]], i32 1
+; CHECK-NEXT:    [[TMP8:%.*]] = extractelement <2 x i32> [[IN]], i64 1
+; CHECK-NEXT:    [[TMP9:%.*]] = insertelement <3 x i32> [[TMP7]], i32 [[TMP8]], i32 2
+; CHECK-NEXT:    [[TMP10:%.*]] = add i32 1, [[IDX]]
+; CHECK-NEXT:    [[TMP11:%.*]] = extractelement <3 x i32> [[TMP9]], i32 [[TMP10]]
+; CHECK-NEXT:    [[XF:%.*]] = bitcast i32 [[TMP11]] to float
+; CHECK-NEXT:    call void @llvm.amdgcn.raw.ptr.buffer.store.f32(float [[XF]], ptr addrspace(8) poison, i32 0, i32 0, i32 0)
+; CHECK-NEXT:    ret void
+;
+entry:
+  %v = alloca [3 x i32], addrspace(5)
+  %v1 = getelementptr [3 x i32], ptr addrspace(5) %v, i32 0, i32 1
+  store i32 %extra, ptr addrspace(5) %v
+  store <2 x i32> %in, ptr addrspace(5) %v1
+  %e = getelementptr [2 x i32], ptr addrspace(5) %v1, i32 0, i32 %idx
+  %x = load i32, ptr addrspace(5) %e
+  %xf = bitcast i32 %x to float
+  call void @llvm.amdgcn.raw.ptr.buffer.store.f32(float %xf, ptr addrspace(8) poison, i32 0, i32 0, i32 0)
+  ret void
+}
+
+attributes #1 = { nounwind "amdgpu-git-ptr-high"="0x1234" }
+
+declare void @llvm.amdgcn.raw.ptr.buffer.store.f32(float, ptr addrspace(8), i32, i32, i32 immarg)

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep-of-gep.ll

arsenm · 2025-05-23T08:11:07Z

llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep-of-gep.ll

+  %0 = getelementptr [10 x <2 x i32>], ptr addrspace(5) %SortedFragments, i32 0, i32 %j
+  %1 = getelementptr i8, ptr addrspace(5) %0, i32 4
+  %2 = load i32, ptr addrspace(5) %1, align 4
+  ret void


Better to make this not dead code. Also use named values in tests

I have updated it.

This is still dead code

Do you mean not use alloca after load? right?

I understood your mind! Thanks !

I have updated it, what do you think about it? :-)

"dead code" in this case refers to the fact that this alloca has no stores, no valid data.
This means the result of the load will always be poison, so load and all associated instructions can be removed if we made the compiler infer that.
Have a look at other existing promote alloca tests for some ideas of putting data into alloca.

Thanks, I have updated it, and I inserted valid data to alloca, what do you think about it?

arsenm · 2025-05-23T08:11:48Z

llvm/test/CodeGen/AMDGPU/amdpal.ll

+; After the change that **promotes the alloca to a vector** (GEP‑of‑GEP
+; promotion), no scratch buffer is needed, so the descriptor load should
+; disappear.


This defeats the purpose of the test. You need to defeat the stack optimization and keep the stack usage

Okay, I have updated file check.

Or disable promote alloca for this test.

I want to see this test scratch memory change to zero, so I think it is necessary to change, what do you think about it?

As previously stated. The test is intended to check scratch descriptor is correctly accessed, thus it must access scratch. Test should not change.

Thanks, I have updated it and disable promote alloca for amdpal.ll

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

perlfu · 2025-05-26T08:09:43Z

llvm/test/CodeGen/AMDGPU/amdpal.ll

+; After the change that **promotes the alloca to a vector** (GEP‑of‑GEP
+; promotion), no scratch buffer is needed, so the descriptor load should
+; disappear.


Or disable promote alloca for this test.

perlfu · 2025-05-26T08:12:58Z

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

+  }
+
+  // Only proceed if this GEP stems from the same alloca.
+  if (CurPtr != Alloca)


How would we end up in this situation?

Thanks. It won't actually end up here, but I'd like to add an assert to verify that it directly references the alloca.

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

perlfu · 2025-05-27T05:20:07Z

llvm/test/CodeGen/AMDGPU/amdpal.ll

+; After the change that **promotes the alloca to a vector** (GEP‑of‑GEP
+; promotion), no scratch buffer is needed, so the descriptor load should
+; disappear.


As previously stated. The test is intended to check scratch descriptor is correctly accessed, thus it must access scratch. Test should not change.

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

arsenm · 2025-05-28T08:44:18Z

llvm/test/CodeGen/AMDGPU/amdpal.ll

+; RUN: llc < %s -mtriple=amdgcn--amdpal -mcpu=tahiti -disable-promote-alloca-to-vector | FileCheck --check-prefixes=PAL,CI --enable-var-scope %s
+; RUN: llc < %s -mtriple=amdgcn--amdpal -mcpu=tonga -disable-promote-alloca-to-vector | FileCheck --check-prefixes=PAL,VI --enable-var-scope %s


Better to just make the access volatile

Thanks, I have updated it.

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

jayfoad

Mostly LGTM

llvm/test/CodeGen/AMDGPU/amdpal.ll

jayfoad · 2025-05-28T16:00:25Z

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

+    CurPtr = CurGEP->getPointerOperand();
+  }
+
+  assert(CurPtr == Alloca && "GEP not based on alloca");


Are you sure this is safe to assert?

I am sure it is safe ! Because it is from the following code:

llvm-project/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

Lines 364 to 370 in c3656af

for (AllocaInst *AI : Allocas) {

const unsigned AllocaCost = DL->getTypeSizeInBits(AI->getAllocatedType());

// First, check if we have enough budget to vectorize this alloca.

if (AllocaCost <= VectorizationBudget) {

// If we do, attempt vectorization, otherwise, fall through and try

// promoting to LDS instead.

if (tryPromoteAllocaToVector(*AI)) {

So it is must alloca type.

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

shiltian · 2025-05-28T16:09:05Z

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

+    CurPtr = CurGEP->getPointerOperand();
+  }
+
+  assert(CurPtr == Alloca && "GEP not based on alloca");


do we need this assertion here? should it be an assertion or error (which returns a nullptr)?

It won't actually case assertion , but I'd like to add an assert to verify that it directly references the alloca.

jayfoad

LGTM

harrisonGPU requested review from jayfoad, arsenm, perlfu, shiltian, ruiling and dstutt May 23, 2025 06:01

llvmbot added the backend:AMDGPU label May 23, 2025

harrisonGPU requested a review from nhaehnle May 23, 2025 06:10

harrisonGPU self-assigned this May 23, 2025

tgymnich reviewed May 23, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp Outdated Show resolved Hide resolved

arsenm reviewed May 23, 2025

View reviewed changes

harrisonGPU requested a review from arsenm May 23, 2025 12:00

perlfu reviewed May 26, 2025

View reviewed changes

harrisonGPU requested a review from perlfu May 26, 2025 12:55

perlfu reviewed May 27, 2025

View reviewed changes

harrisonGPU requested a review from perlfu May 27, 2025 08:53

jayfoad reviewed May 28, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp Show resolved Hide resolved

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp Outdated Show resolved Hide resolved

arsenm reviewed May 28, 2025

View reviewed changes

harrisonGPU requested review from jayfoad and arsenm May 28, 2025 11:44

jayfoad reviewed May 28, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp Outdated Show resolved Hide resolved

harrisonGPU requested a review from jayfoad May 28, 2025 15:32

jayfoad reviewed May 28, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp Outdated Show resolved Hide resolved

harrisonGPU requested a review from jayfoad May 28, 2025 15:55

jayfoad reviewed May 28, 2025

View reviewed changes

shiltian reviewed May 28, 2025

View reviewed changes

harrisonGPU requested review from shiltian and jayfoad May 29, 2025 03:17

[AMDGPU] Promote nestedGEP allocas to vectors

2e58d7e

harrisonGPU added 19 commits May 30, 2025 14:12

[AMDGPU] Update comments.

30bb8c8

[AMDGPU] Use contains.

6364294

[AMDGPU] Update comments.

9d1307f

[AMDGPU] Update lit test.

c2925c8

[AMDGPU] Update code and test.

60f3bac

[AMDGPU] Fix lit test failure.

e6b3e83

[AMDGPU] Remove stripPointerCasts.

4d89174

[AMDGPU] Fix lit test failure.

3420393

[AMDGPU] Update lit test.

3b7b814

[AMDGPU] Update lit test.

160723a

[AMDGPU] Add a new test.

4c95ac1

[AMDGPU] Add a assert to avoid not alloca.

311290a

[AMDGPU] Update comments and disable promote alloca vector in amdpal.ll.

3735fa3

[AMDGPU] Update.

2ece81d

[AMDGPU] Update comments.

7ad2ee2

[AMDGPU] Use volatile.

f2efffb

[AMDGPU] Update.

8e400b9

[AMDGPU] Update again.

872663b

[AMDGPU] Update comment.

a23e944

harrisonGPU force-pushed the amdgpu/proGepOfGep branch from 503b83c to a23e944 Compare May 30, 2025 06:12

jayfoad approved these changes Jun 2, 2025

View reviewed changes

harrisonGPU merged commit 1a7f5f5 into llvm:main Jun 2, 2025
11 checks passed

harrisonGPU deleted the amdgpu/proGepOfGep branch June 2, 2025 08:20

		; RUN: llc < %s -mtriple=amdgcn--amdpal -mcpu=tahiti -disable-promote-alloca-to-vector \| FileCheck --check-prefixes=PAL,CI --enable-var-scope %s
		; RUN: llc < %s -mtriple=amdgcn--amdpal -mcpu=tonga -disable-promote-alloca-to-vector \| FileCheck --check-prefixes=PAL,VI --enable-var-scope %s

	for (AllocaInst *AI : Allocas) {
	const unsigned AllocaCost = DL->getTypeSizeInBits(AI->getAllocatedType());
	// First, check if we have enough budget to vectorize this alloca.
	if (AllocaCost <= VectorizationBudget) {
	// If we do, attempt vectorization, otherwise, fall through and try
	// promoting to LDS instead.
	if (tryPromoteAllocaToVector(*AI)) {

[AMDGPU] Promote nestedGEP allocas to vectors #141199

[AMDGPU] Promote nestedGEP allocas to vectors #141199

Uh oh!

Conversation

harrisonGPU commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented May 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jayfoad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harrisonGPU May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harrisonGPU commented May 23, 2025 •

edited

Loading

harrisonGPU May 28, 2025 •

edited

Loading