bump LLVM to acc6f3e9c1af6c7445aae6f10d4b016ac84112d3 #15296

bjacob · 2023-10-25T18:32:07Z

llvm/llvm-project#70218 just missed the last integrate, and cherry-picks are frowned upon.

The good thing with just missing an integrate is that just bumping the submodule shouldn't be too hard still. I just had to fix up one small thing in CollapseDimensions.

ci-extra:build_test_all_windows,build_test_all_macos_arm64,build_test_all_macos_x86_64

bjacob · 2023-10-25T21:06:52Z

The e2e test compilation errors observed on CI here reproduce locally when enabling the CUDA backend, and bisection points to:

(HEAD) ~/iree/third_party/llvm-project git bisect bad
419c6da3d763cc34b15aa66d4b6e7fed6031cedd is the first bad commit
commit 419c6da3d763cc34b15aa66d4b6e7fed6031cedd
Author: Markus Böck <markus.boeck02@gmail.com>
Date:   Wed Oct 25 11:38:26 2023 +0200

    [mlir][LLVM] Verify too many indices in GEP verifier (#70174)
    
    The current verifier stopped verification with a success value as soon
    as a type was encountered that cannot be indexed into. The correct
    behaviour in this case is to error out as there are too many indices for
    the element type. Not doing so leads to bad user-experience as an
    invalid GEP is likely to fail only later during LLVM IR translation.
    
    This PR implements the correct verification behaviour. Some tests
    upstream had to also be fixed as they were creating invalid GEPs.
    
    Fixes https://github.com/llvm/llvm-project/issues/70168

 mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp         | 98 +++++++---------------
 mlir/test/Dialect/LLVMIR/invalid.mlir              |  8 ++
 mlir/test/Dialect/LLVMIR/mem2reg.mlir              |  9 +-
 .../Dialect/LLVMIR/roundtrip-typed-pointers.mlir   |  4 +-
 mlir/test/Dialect/LLVMIR/roundtrip.mlir            |  4 +-
 5 files changed, 44 insertions(+), 79 deletions(-)

My bisection test command:

ninja iree-compile && tools/iree-compile  --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=cuda --iree-input-type=stablehlo  ~/iree/tests/e2e/regression/large_reduction.mlir  -o /tmp/a.vmfb

bjacob · 2023-10-25T21:07:44Z

The errors look like this:

<unknown>:0: error: 'llvm.getelementptr' op type '!llvm.ptr' cannot be indexed (index #1)
/home/benoit/iree/tests/e2e/regression/large_reduction.mlir:6:13: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"cuda", "cuda-nvptx-fb", {target_arch = "sm_60"}>
  %result = linalg.generic {indexing_maps = [
            ^

bjacob · 2023-10-25T21:09:59Z

So... the CUDA backend generates GetElementPtr with index and needs to generate instead GetElementPtr without index when the base pointer is null?

MaheshRavishankar · 2023-10-25T21:26:19Z

@bjacob typically it is easier to land integrate PRs as a branch in main repo so that we can easily divide and conquer.

MaheshRavishankar · 2023-10-26T06:04:53Z

This is now hitting more issues with LLVM lowering. llvm/llvm-project#70160 (comment)

zero9178 · 2023-10-26T13:56:51Z

The problematic GEP that is triggering the verifier can be found here:
https://github.com/openxla/iree/blob/5610d8cf7c06b71fd3281fb3fa2e8e67fe378d57/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp#L78-L82

It explicitly uses !llvm.ptr as element type for indexing but then uses two indices which is invalid. The reason as to why this used to work in the past and why @Dinistro's builder change exposed the issue is due to the result type of AddressOfOp changing. It used to return a typed pointer when using that builder but now started emitting an opaque pointer.
When GEP got a typed pointer as input it completely ignored the element type explicitly passed to it (the !llvm.ptr) and rather took the element type from the typed pointer. In a way it was two bugs cancelling each other out 😅

Could you whether the following change in the above code unblocks this PR?

     Value shiftedPtr = builder.create<LLVM::GEPOp>(
         loc, globalPtr.getType(),
-        LLVM::LLVMPointerType::get(globalOp.getContext()), globalPtr, 
+        globalOp.getGlobalType(), globalPtr,
         ValueRange({zero, offsetValue}));

This uses !llvm.array<0 x i8> as element type from the global.

This would also fix llvm/llvm-project#70160 @MaheshRavishankar

bjacob · 2023-10-26T15:31:38Z

@zero9178

Could you whether the following change in the above code unblocks this PR?
     Value shiftedPtr = builder.create<LLVM::GEPOp>(
         loc, globalPtr.getType(),
-        LLVM::LLVMPointerType::get(globalOp.getContext()), globalPtr, 
+        globalOp.getGlobalType(), globalPtr,
         ValueRange({zero, offsetValue}));
This uses !llvm.array<0 x i8> as element type from the global.

This would also fix llvm/llvm-project#70160 @MaheshRavishankar

Just tried, and yes this does fix the reproducer that I was using for bisection in #15296 (comment)

I'll retrigger CI with that now.

bjacob · 2023-10-26T17:22:04Z

@MaheshRavishankar it's looking green this time!
Can someone who knows LLVM::AddressOfOp take a look at the changes that I had to make here on the IREE side in this PR?

github-actions · 2023-10-26T18:06:34Z

Abbreviated Benchmark Summary

@ commit b021cd5bedcf6b6bba45a533a33f5e9cd76ab4b8 (vs. base 9cc729fa1cfa23a2ba217079365e1f44487fc870)

Regressed Latencies 🚩

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with zeros @ pixel-6-pro[gpu]	95.170 (vs. 84.647, 12.43%↑)	93.984	6.203
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu]	14.430 (vs. 13.019, 10.84%↑)	14.336	0.518
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags] local\_task(embedded\_elf)[4-thread,full-inference,system-scheduling] with zeros @ pixel-4[big-core]	283.390 (vs. 265.567, 6.71%↑)	285.379	8.885

[Top 3 out of 4 results showed]

Improved Latencies 🎉

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[big-core]	143.041 (vs. 157.393, 9.12%↓)	143.095	0.164
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[big-core]	22.394 (vs. 23.581, 5.03%↓)	22.398	0.038

No improved or regressed compilation metrics 🏖️

For more information:

Source Workflow Run

MaheshRavishankar · 2023-10-26T18:30:23Z

This looks fine to me. But the regressions look real. Maybe the benchmarks can be retriggered to make sure this isnt noise.

zero9178

I believe since #12592 IREE should be using/is using opaque pointers exclusively. Typed pointers are currently in the process of being removed and we'd like to start removing them from the LLVM Dialect soon now that all of upstream has been converted as well.
I put some suggestions as to how this should be written with opaque pointers.
See also https://discourse.llvm.org/t/psa-in-tree-conversion-passes-can-now-be-used-with-llvm-opaque-pointers-please-switch-your-downstream-projects/68738 if you're interested in how opaque pointers differ

zero9178 · 2023-10-26T18:48:26Z

compiler/src/iree/compiler/Codegen/LLVMCPU/DispatchABI.cpp

+  Value globalPtr = builder.create<LLVM::AddressOfOp>(
+      loc,
+      LLVM::LLVMPointerType::get(globalOp.getType(), globalOp.getAddrSpace()),
+      globalOp.getSymName());
  Value ordinalValue = builder.create<LLVM::LoadOp>(loc, globalPtr);


Suggested change

Value globalPtr = builder.create<LLVM::AddressOfOp>(

loc,

LLVM::LLVMPointerType::get(globalOp.getType(), globalOp.getAddrSpace()),

globalOp.getSymName());

Value ordinalValue = builder.create<LLVM::LoadOp>(loc, globalPtr);

Value globalPtr = builder.create<LLVM::AddressOfOp>(

loc, globalOp.getSymName());

Value ordinalValue = builder.create<LLVM::LoadOp>(loc, globalOp.getType(), globalPtr);

compiler/src/iree/compiler/Codegen/LLVMCPU/DispatchABI.cpp

compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp

MaheshRavishankar · 2023-10-26T19:01:18Z

I believe since #12592 IREE should be using/is using opaque pointers exclusively. Typed pointers are currently in the process of being removed and we'd like to start removing them from the LLVM Dialect soon now that all of upstream has been converted as well. I put some suggestions as to how this should be written with opaque pointers. See also https://discourse.llvm.org/t/psa-in-tree-conversion-passes-can-now-be-used-with-llvm-opaque-pointers-please-switch-your-downstream-projects/68738 if you're interested in how opaque pointers differ

Thanks so much @zero9178 ! Appreciate the feedback.... Some of the opaque pointer semantics has been flushed from my cache, so your input is deeply appreciated.

bjacob · 2023-10-26T20:13:56Z

The performance regression is real and I can reproduce it locally on a totally different CPU (AMD Ryzen 9 7950X3D, Zen4).

Reproduction command line (once the models have been imported in the build dir):

ninja iree-benchmark-module iree-compile iree_tests_e2e_test_artifacts_iree-module-MobileNetV3Small_fp32_tflite___x86_64-cascadelake-linux_gnu-llvm_cpu__default-flags_ && tools/iree-benchmark-module --module=e2e_test_artifacts/iree_module_MobileNetV3Small_fp32_tflite___x86_64-cascadelake-linux_gnu-llvm_cpu__default-flags_/module.vmfb --function=main --device_allocator=caching --device=local-sync --input=1x224x224x3xf32=0 --benchmark_min_time=1

Before: 1.63 ms
After: 3.02 ms

I will finish bisecting this tomorrow (near EOD here).

bjacob · 2023-10-26T20:23:07Z

git bisect gets tangled in merges (non-linear history). will get back to this tomorrow morning.

MaheshRavishankar · 2023-10-26T20:46:11Z

git bisect gets tangled in merges (non-linear history). will get back to this tomorrow morning.

Thanks... do you mind giving me a pointer to the .mlir file? I have done this before but I have to re-learn everytime where to get the .mlir from

pzread · 2023-10-27T02:11:50Z

git bisect gets tangled in merges (non-linear history). will get back to this tomorrow morning.

Thanks... do you mind giving me a pointer to the .mlir file? I have done this before but I have to re-learn everytime where to get the .mlir from

https://github.com/openxla/iree/actions/runs/6657301626#summary-18094049071 shows the link to get mlir file

In this case gcloud storage cp gs://iree-github-actions-presubmit-artifacts/6657301626/1/e2e-test-artifacts/iree_MobileNetV3Small_fp32_tflite_.mlir . or https://storage.googleapis.com/iree-github-actions-presubmit-artifacts/6657301626/1/e2e-test-artifacts/iree_MobileNetV3Small_fp32_tflite_.mlir

Or just gcloud storage cp gs://iree-github-actions-presubmit-artifacts/6657301626/1/e2e-test-artifacts/*.mlir* . to download all mlir files

bjacob · 2023-10-27T13:38:22Z

Thanks @pzread , that's super helpful. I didn't know a better way than to build locally e2e test artifacts, which is not hard to do once you have the python venv set up with the right TensorFlow nightly packages, but easy to get wrong. Much better to be able to just download the same mlir files as were used on CI.

bjacob · 2023-10-27T14:16:47Z

Figured the performance regression. It was confusing because it was not a regression in upstream, instead it came from dropping a cherry-pick:
iree-org/llvm-project@1d514d7

commit 1d514d74f885d00ad9eafd38efdb611ffc04b2d2
Author: Stella Laurenzo <stellaraccident@gmail.com>
Date:   Mon Oct 16 17:59:39 2023 -0700

    Revert "[mlir][tosa][linalg] Apply direct tosa -> linalg Conv2D lowering (#68304)"
    
    This reverts commit e29a253c9ebaded53a823def985364392c4ba4ec.
    
    Breaking TFLite mobilenet test. Needs triage.

@stellaraccident

Looks like we have to carry this cherry-pick forward until the regression is fixed upstream?

bjacob · 2023-10-27T14:20:53Z

Actually, the cherry-pick is reverting a PR, llvm/llvm-project#68304 , that introduces a new pattern in TOSAToLinalg. Instead of reverting, we could maybe just make this new pattern optional, and opt out of it on our side. I'll give that a try.

FYI: @FranklandJack (author) @NatashaKnk (reviewer)

bjacob · 2023-10-27T14:36:10Z

OK, this PR was flipping the target kernel layout for conv2D ops imported from TOSA into Linalg named ops: they used to be imported into conv_2d_nhwc_hwcf, with kernel layout HWCF, and it switches them to be imported into a new conv_2d_nhwc_fhwc, with kernel layout FHWC.

There's nothing wrong with that --- but it's not surprising that flipping the data layout is causing this to fall out of the currently optimized path in current downstreams.

So I think the path forward is to make this pass / populatePatterns function take a parameter allowing downstreams to express a preference for HWCF or FHWC filter. I'll do that now.

stellaraccident · 2023-10-27T14:47:49Z

Thanks. Fine to fix forward.

We had that revert because it was crashing. Looks like someone fixed the crash but didn't verify.

FranklandJack · 2023-10-27T15:29:51Z

Hey, sorry about the breakage. FWIW I have a PR open upstream to revert the original behaviour via a transform op: llvm/llvm-project#68567. The idea was that targets could easily opt into this transform to get the original mapping back.

bjacob · 2023-10-27T15:35:13Z

No problem at all @FranklandJack . Your changes were perfectly reasonable. It's just the life of big upstreams and big downstreams.

stellaraccident · 2023-10-27T15:44:05Z

Thanks folks. I had noticed that the patch has been dropped in the last integrate but the notes had said it was for a crash, not a perf regression. Seems it was both. I had tracked the fix to the crash and thought it was fine that kunwar dropped it.

bjacob · 2023-10-27T17:20:45Z

llvm/llvm-project#70482 sent for review upstream...

… ops. (#70482) Switching to FHWC happened in #68304 and is fine in itself but caused downstream performance regression iree-org/iree#15296 (comment) , so this PR makes this optional.

bjacob · 2023-10-27T19:54:45Z

@zero9178

I believe since #12592 IREE should be using/is using opaque pointers exclusively. Typed pointers are currently in the process of being removed and we'd like to start removing them from the LLVM Dialect soon now that all of upstream has been converted as well. I put some suggestions as to how this should be written with opaque pointers. See also https://discourse.llvm.org/t/psa-in-tree-conversion-passes-can-now-be-used-with-llvm-opaque-pointers-please-switch-your-downstream-projects/68738 if you're interested in how opaque pointers differ

Thanks! I ended up following your recommendation.

The reason why I initially hesitated is that we have two lit tests that match typed pointers, and were being broken by this integrate. I initially thought (and we discussed today with @MaheshRavishankar ), let's not increase the scope of this integrate in ways that requires changing there tests, let's defer to a follow-up. But then I read the Discourse you linked to and, I don't want this integrate to move us farther out of the direction that upstream is moving to. Apparently the neutral stance that I was reaching for simply doesn't exist, this integrate has to either move closer or farther from upstream on this. So I followed you suggestion , and that's why this revised PR now also updates these .mlir tests, @MaheshRavishankar .

Other changes in this PR:

Now bumping to a more recent llvm-project commit, to get TosaToLinalgNamed: add option to prefer HWCF kernel layout for Conv2D ops. llvm/llvm-project#70482.
Now setting that new TosaToLinalgNamedOption introduced in TosaToLinalgNamed: add option to prefer HWCF kernel layout for Conv2D ops. llvm/llvm-project#70482 , to avoid the performance regression.

zero9178

Thank you! The LLVM Dialect usage look good to me. I am not really qualified to review any other parts of the PR.

MaheshRavishankar

Great push @bjacob

bjacob · 2023-10-27T20:52:01Z

Pushed a fix for decompose_softmax.mlir.

Next issue this is running into: https://github.com/openxla/iree/actions/runs/6671721799/job/18134792899?pr=15296

Local repro:

~/iree-build ninja /home/benoit/iree-build/e2e_test_artifacts/iree_module_EfficientNetV2STF_stablehlo___cuda-sm_80-linux_gnu-cuda__default-flags_/module.vmfb
[0/2] Re-checking globbed directories...
[2/2] Generating /home/benoit/iree-build/e2e_test_arti...(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags]
FAILED: e2e_test_artifacts/iree_module_EfficientNetV2STF_stablehlo___cuda-sm_80-linux_gnu-cuda__default-flags_/module.vmfb /home/benoit/iree-build/e2e_test_artifacts/iree_module_EfficientNetV2STF_stablehlo___cuda-sm_80-linux_gnu-cuda__default-flags_/module.vmfb 
cd /home/benoit/iree-build/tests/e2e/test_artifacts && /home/benoit/iree-build/tools/iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=cuda --iree-input-type=stablehlo --iree-hal-cuda-llvm-target-arch=sm_80 /home/benoit/iree-build/e2e_test_artifacts/model_EfficientNetV2STF.timestamp_1683504734.mlirbc -o /home/benoit/iree-build/e2e_test_artifacts/iree_module_EfficientNetV2STF_stablehlo___cuda-sm_80-linux_gnu-cuda__default-flags_/module.vmfb --iree-hal-executable-object-search-path=\"/home/benoit/iree-build\"
<unknown>:0: error: 'llvm.getelementptr' op type 'f32' cannot be indexed (index #1)
/tmp/e2e-test-artifacts/iree_EfficientNetV2STF_213fe9a8738a01f2b02b6f0614a40a31c83a2603ca3e3ae0aeab8090fedbe3a0.mlir:2354:13: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"cuda", "cuda-nvptx-fb", {target_arch = "sm_80"}>
/tmp/e2e-test-artifacts/iree_EfficientNetV2STF_213fe9a8738a01f2b02b6f0614a40a31c83a2603ca3e3ae0aeab8090fedbe3a0.mlir:674:3: note: called from
/tmp/e2e-test-artifacts/iree_EfficientNetV2STF_213fe9a8738a01f2b02b6f0614a40a31c83a2603ca3e3ae0aeab8090fedbe3a0.mlir:2354:13: error: failed to serialize executables
/tmp/e2e-test-artifacts/iree_EfficientNetV2STF_213fe9a8738a01f2b02b6f0614a40a31c83a2603ca3e3ae0aeab8090fedbe3a0.mlir:674:3: note: called from
ninja: build stopped: subcommand failed.

More GEP stuff...

bjacob · 2023-10-27T20:58:37Z

OK, got a fix for this one, but i'd like to check how to make it correct. The question is how to decide, based on a mlir::Type, whether GEPs on pointers to that pointee type should take an offset? What should I test for instead of isF32() here?

diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp b/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
index fffcd3ba8..ec197ee8a 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
@@ -72,12 +72,19 @@ void ConvertToDynamicSharedMemory(ModuleOp moduleOp) {
     Value zero = builder.create<LLVM::ConstantOp>(
         loc, IntegerType::get(builder.getContext(), 64),
         builder.getI64IntegerAttr(0));
-    Value offsetValue = builder.create<LLVM::ConstantOp>(
-        loc, IntegerType::get(builder.getContext(), 64),
-        builder.getI64IntegerAttr(offset));
-    Value shiftedPtr = builder.create<LLVM::GEPOp>(
-        loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
-        ValueRange({zero, offsetValue}));
+    Value shiftedPtr;
+    if (globalOp.getGlobalType().isF32()) {
+      shiftedPtr = builder.create<LLVM::GEPOp>(
+          loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
+          ValueRange({zero}));
+    } else {
+      Value offsetValue = builder.create<LLVM::ConstantOp>(
+          loc, IntegerType::get(builder.getContext(), 64),
+          builder.getI64IntegerAttr(offset));
+      shiftedPtr = builder.create<LLVM::GEPOp>(
+          loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
+          ValueRange({zero, offsetValue}));
+    }
     addressOfOp.replaceAllUsesWith(shiftedPtr);
     addressOfOp.erase();
   }

MaheshRavishankar · 2023-10-27T21:03:17Z

OK, got a fix for this one, but i'd like to check how to make it correct. The question is how to decide, based on a mlir::Type, whether GEPs on pointers to that pointee type should take an offset? What should I test for instead of isF32() here?

Check for isFloatType() (or something similar). Similarly there is a isIntegerType() ... You probably want to assert/return error on unhandled types.

diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp b/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
index fffcd3ba8..ec197ee8a 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
@@ -72,12 +72,19 @@ void ConvertToDynamicSharedMemory(ModuleOp moduleOp) {
     Value zero = builder.create<LLVM::ConstantOp>(
         loc, IntegerType::get(builder.getContext(), 64),
         builder.getI64IntegerAttr(0));
-    Value offsetValue = builder.create<LLVM::ConstantOp>(
-        loc, IntegerType::get(builder.getContext(), 64),
-        builder.getI64IntegerAttr(offset));
-    Value shiftedPtr = builder.create<LLVM::GEPOp>(
-        loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
-        ValueRange({zero, offsetValue}));
+    Value shiftedPtr;
+    if (globalOp.getGlobalType().isF32()) {
+      shiftedPtr = builder.create<LLVM::GEPOp>(
+          loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
+          ValueRange({zero}));
+    } else {
+      Value offsetValue = builder.create<LLVM::ConstantOp>(
+          loc, IntegerType::get(builder.getContext(), 64),
+          builder.getI64IntegerAttr(offset));
+      shiftedPtr = builder.create<LLVM::GEPOp>(
+          loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
+          ValueRange({zero, offsetValue}));
+    }
     addressOfOp.replaceAllUsesWith(shiftedPtr);
     addressOfOp.erase();
   }

zero9178 · 2023-10-27T21:04:25Z

OK, got a fix for this one, but i'd like to check how to make it correct. The question is how to decide, based on a mlir::Type, whether GEPs on pointers to that pointee type should take an offset? What should I test for instead of isF32() here?

diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp b/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
index fffcd3ba8..ec197ee8a 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
@@ -72,12 +72,19 @@ void ConvertToDynamicSharedMemory(ModuleOp moduleOp) {
     Value zero = builder.create<LLVM::ConstantOp>(
         loc, IntegerType::get(builder.getContext(), 64),
         builder.getI64IntegerAttr(0));
-    Value offsetValue = builder.create<LLVM::ConstantOp>(
-        loc, IntegerType::get(builder.getContext(), 64),
-        builder.getI64IntegerAttr(offset));
-    Value shiftedPtr = builder.create<LLVM::GEPOp>(
-        loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
-        ValueRange({zero, offsetValue}));
+    Value shiftedPtr;
+    if (globalOp.getGlobalType().isF32()) {
+      shiftedPtr = builder.create<LLVM::GEPOp>(
+          loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
+          ValueRange({zero}));
+    } else {
+      Value offsetValue = builder.create<LLVM::ConstantOp>(
+          loc, IntegerType::get(builder.getContext(), 64),
+          builder.getI64IntegerAttr(offset));
+      shiftedPtr = builder.create<LLVM::GEPOp>(
+          loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
+          ValueRange({zero, offsetValue}));
+    }
     addressOfOp.replaceAllUsesWith(shiftedPtr);
     addressOfOp.erase();
   }

I believe I made a mistake in the original suggestion. It should be global.getGlobalType() not globalOp.getGlobalType() to match the semantics of what the code did previously. It seems like the latter might return a global that could be a f32 while the former correct one should always be !llvm.array making the GEP good as is and hopefully fix the issue you're seeing.

Sorry for the inconvenience.

bjacob · 2023-10-27T21:05:30Z

Ah, thanks a lot @zero9178 ! Hopefully that will also take care of the illegal address errors on CUDA CI:
https://github.com/openxla/iree/actions/runs/6671721799/job/18134790147?pr=15296

bjacob · 2023-10-27T21:06:35Z

Funny, I had noticed how the existing code had two separate variables named global and globalOp and thought "that's scary".

bjacob · 2023-10-27T21:08:23Z

So that does seem to work locally and remove the need for the isF32() diff that I was asking about above, thanks! Going to push this now to retrigger CI.

llvm/llvm-project#70218 just missed the last integrate, and cherry-picks are frowned upon. The good thing with just missing an integrate is that just bumping the submodule shouldn't be too hard still. I just had to fix up one small thing in CollapseDimensions. ci-extra:build_test_all_windows,build_test_all_macos_arm64,build_test_all_macos_x86_64

bjacob marked this pull request as ready for review October 25, 2023 18:37

bjacob requested review from hanhanW and MaheshRavishankar as code owners October 25, 2023 18:37

bjacob force-pushed the bump-llvm-to-get-batch-vecmat branch from be53396 to 677d373 Compare October 25, 2023 19:22

bjacob requested a review from dcaballe as a code owner October 25, 2023 19:22

pzread added the benchmarks:x86_64 Run default x86_64 benchmarks label Oct 26, 2023

bjacob force-pushed the bump-llvm-to-get-batch-vecmat branch from 74268c0 to 185ad95 Compare October 26, 2023 18:39

bjacob mentioned this pull request Oct 26, 2023

[DT][CPU] Lift linalg.generic to linalg.batch_matmul + transpose + arith.ext* #15214

Closed

zero9178 reviewed Oct 26, 2023

View reviewed changes

bjacob mentioned this pull request Oct 27, 2023

TosaToLinalgNamed: add option to prefer HWCF kernel layout for Conv2D ops. llvm/llvm-project#70482

Merged

bump LLVM to get batch_vecmat

8fb298d

bjacob force-pushed the bump-llvm-to-get-batch-vecmat branch from 185ad95 to 8fb298d Compare October 27, 2023 19:50

bjacob requested a review from rsuderman as a code owner October 27, 2023 19:50

bjacob requested a review from zero9178 October 27, 2023 19:55

zero9178 approved these changes Oct 27, 2023

View reviewed changes

bjacob changed the title ~~bump LLVM to get batch_vecmat~~ bump LLVM to acc6f3e9c1af6c7445aae6f10d4b016ac84112d3 ci-extra: build_test_all_windows,build_test_all_macos_arm64,build_test_all_macos_x86_64 Oct 27, 2023

MaheshRavishankar approved these changes Oct 27, 2023

View reviewed changes

bjacob added the benchmarks:android-cpu Run default Android CPU benchmarks label Oct 27, 2023

bjacob changed the title ~~bump LLVM to acc6f3e9c1af6c7445aae6f10d4b016ac84112d3 ci-extra: build_test_all_windows,build_test_all_macos_arm64,build_test_all_macos_x86_64~~ bump LLVM to acc6f3e9c1af6c7445aae6f10d4b016ac84112d3 Oct 27, 2023

adapt decompose_softmax.mlir

365520d

bjacob force-pushed the bump-llvm-to-get-batch-vecmat branch from 0a9c77d to 365520d Compare October 27, 2023 20:45

fix globalOp->global

e4edee9

stellaraccident approved these changes Oct 27, 2023

View reviewed changes

stellaraccident merged commit 11debb4 into iree-org:main Oct 27, 2023

bump LLVM to acc6f3e9c1af6c7445aae6f10d4b016ac84112d3 #15296

bump LLVM to acc6f3e9c1af6c7445aae6f10d4b016ac84112d3 #15296

Uh oh!

Conversation

bjacob commented Oct 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjacob commented Oct 25, 2023

Uh oh!

bjacob commented Oct 25, 2023

Uh oh!

bjacob commented Oct 25, 2023

Uh oh!

MaheshRavishankar commented Oct 25, 2023

Uh oh!

MaheshRavishankar commented Oct 26, 2023

Uh oh!

zero9178 commented Oct 26, 2023

Uh oh!

bjacob commented Oct 26, 2023

Uh oh!

bjacob commented Oct 26, 2023

Uh oh!

github-actions bot commented Oct 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Abbreviated Benchmark Summary

Regressed Latencies 🚩

Improved Latencies 🎉

Uh oh!

MaheshRavishankar commented Oct 26, 2023

Uh oh!

zero9178 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zero9178 Oct 26, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MaheshRavishankar commented Oct 26, 2023

Uh oh!

bjacob commented Oct 26, 2023

Uh oh!

bjacob commented Oct 26, 2023

Uh oh!

MaheshRavishankar commented Oct 26, 2023

Uh oh!

pzread commented Oct 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjacob commented Oct 27, 2023

Uh oh!

bjacob commented Oct 27, 2023

Uh oh!

bjacob commented Oct 27, 2023

Uh oh!

bjacob commented Oct 27, 2023

Uh oh!

stellaraccident commented Oct 27, 2023

Uh oh!

FranklandJack commented Oct 27, 2023

Uh oh!

bjacob commented Oct 27, 2023

Uh oh!

stellaraccident commented Oct 27, 2023

Uh oh!

bjacob commented Oct 27, 2023

Uh oh!

bjacob commented Oct 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zero9178 left a comment

Choose a reason for hiding this comment

Uh oh!

MaheshRavishankar left a comment

Choose a reason for hiding this comment

Uh oh!

bjacob commented Oct 27, 2023

Uh oh!

bjacob commented Oct 25, 2023 •

edited

Loading

github-actions bot commented Oct 26, 2023 •

edited

Loading

zero9178 left a comment •

edited

Loading

pzread commented Oct 27, 2023 •

edited

Loading

bjacob commented Oct 27, 2023 •

edited

Loading