Skip to content

Conversation

bjacob
Copy link
Contributor

@bjacob bjacob commented Oct 25, 2023

llvm/llvm-project#70218 just missed the last integrate, and cherry-picks are frowned upon.

The good thing with just missing an integrate is that just bumping the submodule shouldn't be too hard still. I just had to fix up one small thing in CollapseDimensions.

ci-extra:build_test_all_windows,build_test_all_macos_arm64,build_test_all_macos_x86_64

@bjacob bjacob marked this pull request as ready for review October 25, 2023 18:37
@bjacob bjacob force-pushed the bump-llvm-to-get-batch-vecmat branch from be53396 to 677d373 Compare October 25, 2023 19:22
@bjacob bjacob requested a review from dcaballe as a code owner October 25, 2023 19:22
@bjacob
Copy link
Contributor Author

bjacob commented Oct 25, 2023

The e2e test compilation errors observed on CI here reproduce locally when enabling the CUDA backend, and bisection points to:

(HEAD) ~/iree/third_party/llvm-project git bisect bad
419c6da3d763cc34b15aa66d4b6e7fed6031cedd is the first bad commit
commit 419c6da3d763cc34b15aa66d4b6e7fed6031cedd
Author: Markus Böck <markus.boeck02@gmail.com>
Date:   Wed Oct 25 11:38:26 2023 +0200

    [mlir][LLVM] Verify too many indices in GEP verifier (#70174)
    
    The current verifier stopped verification with a success value as soon
    as a type was encountered that cannot be indexed into. The correct
    behaviour in this case is to error out as there are too many indices for
    the element type. Not doing so leads to bad user-experience as an
    invalid GEP is likely to fail only later during LLVM IR translation.
    
    This PR implements the correct verification behaviour. Some tests
    upstream had to also be fixed as they were creating invalid GEPs.
    
    Fixes https://github.com/llvm/llvm-project/issues/70168

 mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp         | 98 +++++++---------------
 mlir/test/Dialect/LLVMIR/invalid.mlir              |  8 ++
 mlir/test/Dialect/LLVMIR/mem2reg.mlir              |  9 +-
 .../Dialect/LLVMIR/roundtrip-typed-pointers.mlir   |  4 +-
 mlir/test/Dialect/LLVMIR/roundtrip.mlir            |  4 +-
 5 files changed, 44 insertions(+), 79 deletions(-)

My bisection test command:

ninja iree-compile && tools/iree-compile  --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=cuda --iree-input-type=stablehlo  ~/iree/tests/e2e/regression/large_reduction.mlir  -o /tmp/a.vmfb

@bjacob
Copy link
Contributor Author

bjacob commented Oct 25, 2023

The errors look like this:

<unknown>:0: error: 'llvm.getelementptr' op type '!llvm.ptr' cannot be indexed (index #1)
/home/benoit/iree/tests/e2e/regression/large_reduction.mlir:6:13: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"cuda", "cuda-nvptx-fb", {target_arch = "sm_60"}>
  %result = linalg.generic {indexing_maps = [
            ^

@bjacob
Copy link
Contributor Author

bjacob commented Oct 25, 2023

So... the CUDA backend generates GetElementPtr with index and needs to generate instead GetElementPtr without index when the base pointer is null?

@MaheshRavishankar
Copy link
Contributor

@bjacob typically it is easier to land integrate PRs as a branch in main repo so that we can easily divide and conquer.

@MaheshRavishankar
Copy link
Contributor

This is now hitting more issues with LLVM lowering. llvm/llvm-project#70160 (comment)

@zero9178
Copy link
Member

The problematic GEP that is triggering the verifier can be found here:
https://github.com/openxla/iree/blob/5610d8cf7c06b71fd3281fb3fa2e8e67fe378d57/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp#L78-L82

It explicitly uses !llvm.ptr as element type for indexing but then uses two indices which is invalid. The reason as to why this used to work in the past and why @Dinistro's builder change exposed the issue is due to the result type of AddressOfOp changing. It used to return a typed pointer when using that builder but now started emitting an opaque pointer.
When GEP got a typed pointer as input it completely ignored the element type explicitly passed to it (the !llvm.ptr) and rather took the element type from the typed pointer. In a way it was two bugs cancelling each other out 😅

Could you whether the following change in the above code unblocks this PR?

     Value shiftedPtr = builder.create<LLVM::GEPOp>(
         loc, globalPtr.getType(),
-        LLVM::LLVMPointerType::get(globalOp.getContext()), globalPtr, 
+        globalOp.getGlobalType(), globalPtr,
         ValueRange({zero, offsetValue}));

This uses !llvm.array<0 x i8> as element type from the global.

This would also fix llvm/llvm-project#70160 @MaheshRavishankar

@bjacob
Copy link
Contributor Author

bjacob commented Oct 26, 2023

@zero9178

Could you whether the following change in the above code unblocks this PR?

     Value shiftedPtr = builder.create<LLVM::GEPOp>(
         loc, globalPtr.getType(),
-        LLVM::LLVMPointerType::get(globalOp.getContext()), globalPtr, 
+        globalOp.getGlobalType(), globalPtr,
         ValueRange({zero, offsetValue}));

This uses !llvm.array<0 x i8> as element type from the global.

This would also fix llvm/llvm-project#70160 @MaheshRavishankar

Just tried, and yes this does fix the reproducer that I was using for bisection in #15296 (comment)

I'll retrigger CI with that now.

@bjacob
Copy link
Contributor Author

bjacob commented Oct 26, 2023

@MaheshRavishankar it's looking green this time!
Can someone who knows LLVM::AddressOfOp take a look at the changes that I had to make here on the IREE side in this PR?

@github-actions
Copy link

github-actions bot commented Oct 26, 2023

Abbreviated Benchmark Summary

@ commit b021cd5bedcf6b6bba45a533a33f5e9cd76ab4b8 (vs. base 9cc729fa1cfa23a2ba217079365e1f44487fc870)

Regressed Latencies 🚩

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with zeros @ pixel-6-pro[gpu] 95.170 (vs. 84.647, 12.43%↑) 93.984 6.203
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu] 14.430 (vs. 13.019, 10.84%↑) 14.336 0.518
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags] local\_task(embedded\_elf)[4-thread,full-inference,system-scheduling] with zeros @ pixel-4[big-core] 283.390 (vs. 265.567, 6.71%↑) 285.379 8.885

[Top 3 out of 4 results showed]

Improved Latencies 🎉

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[big-core] 143.041 (vs. 157.393, 9.12%↓) 143.095 0.164
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,data-tiling,ukernel] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[big-core] 22.394 (vs. 23.581, 5.03%↓) 22.398 0.038

No improved or regressed compilation metrics 🏖️

For more information:

Source Workflow Run

@MaheshRavishankar
Copy link
Contributor

This looks fine to me. But the regressions look real. Maybe the benchmarks can be retriggered to make sure this isnt noise.

Copy link
Member

@zero9178 zero9178 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe since #12592 IREE should be using/is using opaque pointers exclusively. Typed pointers are currently in the process of being removed and we'd like to start removing them from the LLVM Dialect soon now that all of upstream has been converted as well.
I put some suggestions as to how this should be written with opaque pointers.
See also https://discourse.llvm.org/t/psa-in-tree-conversion-passes-can-now-be-used-with-llvm-opaque-pointers-please-switch-your-downstream-projects/68738 if you're interested in how opaque pointers differ

Comment on lines 1013 to 1017
Value globalPtr = builder.create<LLVM::AddressOfOp>(
loc,
LLVM::LLVMPointerType::get(globalOp.getType(), globalOp.getAddrSpace()),
globalOp.getSymName());
Value ordinalValue = builder.create<LLVM::LoadOp>(loc, globalPtr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Value globalPtr = builder.create<LLVM::AddressOfOp>(
loc,
LLVM::LLVMPointerType::get(globalOp.getType(), globalOp.getAddrSpace()),
globalOp.getSymName());
Value ordinalValue = builder.create<LLVM::LoadOp>(loc, globalPtr);
Value globalPtr = builder.create<LLVM::AddressOfOp>(
loc, globalOp.getSymName());
Value ordinalValue = builder.create<LLVM::LoadOp>(loc, globalOp.getType(), globalPtr);

@MaheshRavishankar
Copy link
Contributor

I believe since #12592 IREE should be using/is using opaque pointers exclusively. Typed pointers are currently in the process of being removed and we'd like to start removing them from the LLVM Dialect soon now that all of upstream has been converted as well. I put some suggestions as to how this should be written with opaque pointers. See also https://discourse.llvm.org/t/psa-in-tree-conversion-passes-can-now-be-used-with-llvm-opaque-pointers-please-switch-your-downstream-projects/68738 if you're interested in how opaque pointers differ

Thanks so much @zero9178 ! Appreciate the feedback.... Some of the opaque pointer semantics has been flushed from my cache, so your input is deeply appreciated.

@bjacob
Copy link
Contributor Author

bjacob commented Oct 26, 2023

The performance regression is real and I can reproduce it locally on a totally different CPU (AMD Ryzen 9 7950X3D, Zen4).

Reproduction command line (once the models have been imported in the build dir):

ninja iree-benchmark-module iree-compile iree_tests_e2e_test_artifacts_iree-module-MobileNetV3Small_fp32_tflite___x86_64-cascadelake-linux_gnu-llvm_cpu__default-flags_ && tools/iree-benchmark-module --module=e2e_test_artifacts/iree_module_MobileNetV3Small_fp32_tflite___x86_64-cascadelake-linux_gnu-llvm_cpu__default-flags_/module.vmfb --function=main --device_allocator=caching --device=local-sync --input=1x224x224x3xf32=0 --benchmark_min_time=1

Before: 1.63 ms
After: 3.02 ms

I will finish bisecting this tomorrow (near EOD here).

@bjacob
Copy link
Contributor Author

bjacob commented Oct 26, 2023

git bisect gets tangled in merges (non-linear history). will get back to this tomorrow morning.

@MaheshRavishankar
Copy link
Contributor

git bisect gets tangled in merges (non-linear history). will get back to this tomorrow morning.

Thanks... do you mind giving me a pointer to the .mlir file? I have done this before but I have to re-learn everytime where to get the .mlir from

@pzread
Copy link
Contributor

pzread commented Oct 27, 2023

git bisect gets tangled in merges (non-linear history). will get back to this tomorrow morning.

Thanks... do you mind giving me a pointer to the .mlir file? I have done this before but I have to re-learn everytime where to get the .mlir from

https://github.com/openxla/iree/actions/runs/6657301626#summary-18094049071 shows the link to get mlir file

In this case gcloud storage cp gs://iree-github-actions-presubmit-artifacts/6657301626/1/e2e-test-artifacts/iree_MobileNetV3Small_fp32_tflite_.mlir . or https://storage.googleapis.com/iree-github-actions-presubmit-artifacts/6657301626/1/e2e-test-artifacts/iree_MobileNetV3Small_fp32_tflite_.mlir

Or just gcloud storage cp gs://iree-github-actions-presubmit-artifacts/6657301626/1/e2e-test-artifacts/*.mlir* . to download all mlir files

@bjacob
Copy link
Contributor Author

bjacob commented Oct 27, 2023

Thanks @pzread , that's super helpful. I didn't know a better way than to build locally e2e test artifacts, which is not hard to do once you have the python venv set up with the right TensorFlow nightly packages, but easy to get wrong. Much better to be able to just download the same mlir files as were used on CI.

@bjacob
Copy link
Contributor Author

bjacob commented Oct 27, 2023

Figured the performance regression. It was confusing because it was not a regression in upstream, instead it came from dropping a cherry-pick:
iree-org/llvm-project@1d514d7

commit 1d514d74f885d00ad9eafd38efdb611ffc04b2d2
Author: Stella Laurenzo <stellaraccident@gmail.com>
Date:   Mon Oct 16 17:59:39 2023 -0700

    Revert "[mlir][tosa][linalg] Apply direct tosa -> linalg Conv2D lowering (#68304)"
    
    This reverts commit e29a253c9ebaded53a823def985364392c4ba4ec.
    
    Breaking TFLite mobilenet test. Needs triage.

@stellaraccident

Looks like we have to carry this cherry-pick forward until the regression is fixed upstream?

@bjacob
Copy link
Contributor Author

bjacob commented Oct 27, 2023

Actually, the cherry-pick is reverting a PR, llvm/llvm-project#68304 , that introduces a new pattern in TOSAToLinalg. Instead of reverting, we could maybe just make this new pattern optional, and opt out of it on our side. I'll give that a try.

FYI: @FranklandJack (author) @NatashaKnk (reviewer)

@bjacob
Copy link
Contributor Author

bjacob commented Oct 27, 2023

OK, this PR was flipping the target kernel layout for conv2D ops imported from TOSA into Linalg named ops: they used to be imported into conv_2d_nhwc_hwcf, with kernel layout HWCF, and it switches them to be imported into a new conv_2d_nhwc_fhwc, with kernel layout FHWC.

There's nothing wrong with that --- but it's not surprising that flipping the data layout is causing this to fall out of the currently optimized path in current downstreams.

So I think the path forward is to make this pass / populatePatterns function take a parameter allowing downstreams to express a preference for HWCF or FHWC filter. I'll do that now.

@stellaraccident
Copy link
Collaborator

Thanks. Fine to fix forward.

We had that revert because it was crashing. Looks like someone fixed the crash but didn't verify.

@FranklandJack
Copy link

Hey, sorry about the breakage. FWIW I have a PR open upstream to revert the original behaviour via a transform op: llvm/llvm-project#68567. The idea was that targets could easily opt into this transform to get the original mapping back.

@bjacob
Copy link
Contributor Author

bjacob commented Oct 27, 2023

No problem at all @FranklandJack . Your changes were perfectly reasonable. It's just the life of big upstreams and big downstreams.

@stellaraccident
Copy link
Collaborator

Thanks folks. I had noticed that the patch has been dropped in the last integrate but the notes had said it was for a crash, not a perf regression. Seems it was both. I had tracked the fix to the crash and thought it was fine that kunwar dropped it.

@bjacob
Copy link
Contributor Author

bjacob commented Oct 27, 2023

llvm/llvm-project#70482 sent for review upstream...

bjacob added a commit to llvm/llvm-project that referenced this pull request Oct 27, 2023
… ops. (#70482)

Switching to FHWC happened in
#68304 and is fine in itself
but caused downstream performance regression
iree-org/iree#15296 (comment) , so
this PR makes this optional.
@bjacob bjacob force-pushed the bump-llvm-to-get-batch-vecmat branch from 185ad95 to 8fb298d Compare October 27, 2023 19:50
@bjacob bjacob requested a review from rsuderman as a code owner October 27, 2023 19:50
@bjacob
Copy link
Contributor Author

bjacob commented Oct 27, 2023

@zero9178

I believe since #12592 IREE should be using/is using opaque pointers exclusively. Typed pointers are currently in the process of being removed and we'd like to start removing them from the LLVM Dialect soon now that all of upstream has been converted as well. I put some suggestions as to how this should be written with opaque pointers. See also https://discourse.llvm.org/t/psa-in-tree-conversion-passes-can-now-be-used-with-llvm-opaque-pointers-please-switch-your-downstream-projects/68738 if you're interested in how opaque pointers differ

Thanks! I ended up following your recommendation.

The reason why I initially hesitated is that we have two lit tests that match typed pointers, and were being broken by this integrate. I initially thought (and we discussed today with @MaheshRavishankar ), let's not increase the scope of this integrate in ways that requires changing there tests, let's defer to a follow-up. But then I read the Discourse you linked to and, I don't want this integrate to move us farther out of the direction that upstream is moving to. Apparently the neutral stance that I was reaching for simply doesn't exist, this integrate has to either move closer or farther from upstream on this. So I followed you suggestion , and that's why this revised PR now also updates these .mlir tests, @MaheshRavishankar .

Other changes in this PR:

@bjacob bjacob requested a review from zero9178 October 27, 2023 19:55
Copy link
Member

@zero9178 zero9178 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! The LLVM Dialect usage look good to me. I am not really qualified to review any other parts of the PR.

@bjacob bjacob changed the title bump LLVM to get batch_vecmat bump LLVM to acc6f3e9c1af6c7445aae6f10d4b016ac84112d3 ci-extra: build_test_all_windows,build_test_all_macos_arm64,build_test_all_macos_x86_64 Oct 27, 2023
Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great push @bjacob

@bjacob bjacob added the benchmarks:android-cpu Run default Android CPU benchmarks label Oct 27, 2023
@bjacob bjacob changed the title bump LLVM to acc6f3e9c1af6c7445aae6f10d4b016ac84112d3 ci-extra: build_test_all_windows,build_test_all_macos_arm64,build_test_all_macos_x86_64 bump LLVM to acc6f3e9c1af6c7445aae6f10d4b016ac84112d3 Oct 27, 2023
@bjacob bjacob force-pushed the bump-llvm-to-get-batch-vecmat branch from 0a9c77d to 365520d Compare October 27, 2023 20:45
@bjacob
Copy link
Contributor Author

bjacob commented Oct 27, 2023

Pushed a fix for decompose_softmax.mlir.

Next issue this is running into: https://github.com/openxla/iree/actions/runs/6671721799/job/18134792899?pr=15296

Local repro:

~/iree-build ninja /home/benoit/iree-build/e2e_test_artifacts/iree_module_EfficientNetV2STF_stablehlo___cuda-sm_80-linux_gnu-cuda__default-flags_/module.vmfb
[0/2] Re-checking globbed directories...
[2/2] Generating /home/benoit/iree-build/e2e_test_arti...(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags]
FAILED: e2e_test_artifacts/iree_module_EfficientNetV2STF_stablehlo___cuda-sm_80-linux_gnu-cuda__default-flags_/module.vmfb /home/benoit/iree-build/e2e_test_artifacts/iree_module_EfficientNetV2STF_stablehlo___cuda-sm_80-linux_gnu-cuda__default-flags_/module.vmfb 
cd /home/benoit/iree-build/tests/e2e/test_artifacts && /home/benoit/iree-build/tools/iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=cuda --iree-input-type=stablehlo --iree-hal-cuda-llvm-target-arch=sm_80 /home/benoit/iree-build/e2e_test_artifacts/model_EfficientNetV2STF.timestamp_1683504734.mlirbc -o /home/benoit/iree-build/e2e_test_artifacts/iree_module_EfficientNetV2STF_stablehlo___cuda-sm_80-linux_gnu-cuda__default-flags_/module.vmfb --iree-hal-executable-object-search-path=\"/home/benoit/iree-build\"
<unknown>:0: error: 'llvm.getelementptr' op type 'f32' cannot be indexed (index #1)
/tmp/e2e-test-artifacts/iree_EfficientNetV2STF_213fe9a8738a01f2b02b6f0614a40a31c83a2603ca3e3ae0aeab8090fedbe3a0.mlir:2354:13: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"cuda", "cuda-nvptx-fb", {target_arch = "sm_80"}>
/tmp/e2e-test-artifacts/iree_EfficientNetV2STF_213fe9a8738a01f2b02b6f0614a40a31c83a2603ca3e3ae0aeab8090fedbe3a0.mlir:674:3: note: called from
/tmp/e2e-test-artifacts/iree_EfficientNetV2STF_213fe9a8738a01f2b02b6f0614a40a31c83a2603ca3e3ae0aeab8090fedbe3a0.mlir:2354:13: error: failed to serialize executables
/tmp/e2e-test-artifacts/iree_EfficientNetV2STF_213fe9a8738a01f2b02b6f0614a40a31c83a2603ca3e3ae0aeab8090fedbe3a0.mlir:674:3: note: called from
ninja: build stopped: subcommand failed.

More GEP stuff...

@bjacob
Copy link
Contributor Author

bjacob commented Oct 27, 2023

OK, got a fix for this one, but i'd like to check how to make it correct. The question is how to decide, based on a mlir::Type, whether GEPs on pointers to that pointee type should take an offset? What should I test for instead of isF32() here?

diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp b/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
index fffcd3ba8..ec197ee8a 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
@@ -72,12 +72,19 @@ void ConvertToDynamicSharedMemory(ModuleOp moduleOp) {
     Value zero = builder.create<LLVM::ConstantOp>(
         loc, IntegerType::get(builder.getContext(), 64),
         builder.getI64IntegerAttr(0));
-    Value offsetValue = builder.create<LLVM::ConstantOp>(
-        loc, IntegerType::get(builder.getContext(), 64),
-        builder.getI64IntegerAttr(offset));
-    Value shiftedPtr = builder.create<LLVM::GEPOp>(
-        loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
-        ValueRange({zero, offsetValue}));
+    Value shiftedPtr;
+    if (globalOp.getGlobalType().isF32()) {
+      shiftedPtr = builder.create<LLVM::GEPOp>(
+          loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
+          ValueRange({zero}));
+    } else {
+      Value offsetValue = builder.create<LLVM::ConstantOp>(
+          loc, IntegerType::get(builder.getContext(), 64),
+          builder.getI64IntegerAttr(offset));
+      shiftedPtr = builder.create<LLVM::GEPOp>(
+          loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
+          ValueRange({zero, offsetValue}));
+    }
     addressOfOp.replaceAllUsesWith(shiftedPtr);
     addressOfOp.erase();
   }

@MaheshRavishankar
Copy link
Contributor

OK, got a fix for this one, but i'd like to check how to make it correct. The question is how to decide, based on a mlir::Type, whether GEPs on pointers to that pointee type should take an offset? What should I test for instead of isF32() here?

Check for isFloatType() (or something similar). Similarly there is a isIntegerType() ... You probably want to assert/return error on unhandled types.

diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp b/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
index fffcd3ba8..ec197ee8a 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
@@ -72,12 +72,19 @@ void ConvertToDynamicSharedMemory(ModuleOp moduleOp) {
     Value zero = builder.create<LLVM::ConstantOp>(
         loc, IntegerType::get(builder.getContext(), 64),
         builder.getI64IntegerAttr(0));
-    Value offsetValue = builder.create<LLVM::ConstantOp>(
-        loc, IntegerType::get(builder.getContext(), 64),
-        builder.getI64IntegerAttr(offset));
-    Value shiftedPtr = builder.create<LLVM::GEPOp>(
-        loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
-        ValueRange({zero, offsetValue}));
+    Value shiftedPtr;
+    if (globalOp.getGlobalType().isF32()) {
+      shiftedPtr = builder.create<LLVM::GEPOp>(
+          loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
+          ValueRange({zero}));
+    } else {
+      Value offsetValue = builder.create<LLVM::ConstantOp>(
+          loc, IntegerType::get(builder.getContext(), 64),
+          builder.getI64IntegerAttr(offset));
+      shiftedPtr = builder.create<LLVM::GEPOp>(
+          loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
+          ValueRange({zero, offsetValue}));
+    }
     addressOfOp.replaceAllUsesWith(shiftedPtr);
     addressOfOp.erase();
   }

@zero9178
Copy link
Member

OK, got a fix for this one, but i'd like to check how to make it correct. The question is how to decide, based on a mlir::Type, whether GEPs on pointers to that pointee type should take an offset? What should I test for instead of isF32() here?

diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp b/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
index fffcd3ba8..ec197ee8a 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
@@ -72,12 +72,19 @@ void ConvertToDynamicSharedMemory(ModuleOp moduleOp) {
     Value zero = builder.create<LLVM::ConstantOp>(
         loc, IntegerType::get(builder.getContext(), 64),
         builder.getI64IntegerAttr(0));
-    Value offsetValue = builder.create<LLVM::ConstantOp>(
-        loc, IntegerType::get(builder.getContext(), 64),
-        builder.getI64IntegerAttr(offset));
-    Value shiftedPtr = builder.create<LLVM::GEPOp>(
-        loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
-        ValueRange({zero, offsetValue}));
+    Value shiftedPtr;
+    if (globalOp.getGlobalType().isF32()) {
+      shiftedPtr = builder.create<LLVM::GEPOp>(
+          loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
+          ValueRange({zero}));
+    } else {
+      Value offsetValue = builder.create<LLVM::ConstantOp>(
+          loc, IntegerType::get(builder.getContext(), 64),
+          builder.getI64IntegerAttr(offset));
+      shiftedPtr = builder.create<LLVM::GEPOp>(
+          loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
+          ValueRange({zero, offsetValue}));
+    }
     addressOfOp.replaceAllUsesWith(shiftedPtr);
     addressOfOp.erase();
   }

I believe I made a mistake in the original suggestion. It should be global.getGlobalType() not globalOp.getGlobalType() to match the semantics of what the code did previously. It seems like the latter might return a global that could be a f32 while the former correct one should always be !llvm.array making the GEP good as is and hopefully fix the issue you're seeing.

Sorry for the inconvenience.

@bjacob
Copy link
Contributor Author

bjacob commented Oct 27, 2023

Ah, thanks a lot @zero9178 ! Hopefully that will also take care of the illegal address errors on CUDA CI:
https://github.com/openxla/iree/actions/runs/6671721799/job/18134790147?pr=15296

@bjacob
Copy link
Contributor Author

bjacob commented Oct 27, 2023

Funny, I had noticed how the existing code had two separate variables named global and globalOp and thought "that's scary".

@bjacob
Copy link
Contributor Author

bjacob commented Oct 27, 2023

So that does seem to work locally and remove the need for the isF32() diff that I was asking about above, thanks! Going to push this now to retrigger CI.

@stellaraccident stellaraccident merged commit 11debb4 into iree-org:main Oct 27, 2023
ramiro050 pushed a commit to ramiro050/iree that referenced this pull request Dec 19, 2023
llvm/llvm-project#70218 just missed the last
integrate, and cherry-picks are frowned upon.

The good thing with just missing an integrate is that just bumping the
submodule shouldn't be too hard still. I just had to fix up one small
thing in CollapseDimensions.


ci-extra:build_test_all_windows,build_test_all_macos_arm64,build_test_all_macos_x86_64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarks:android-cpu Run default Android CPU benchmarks benchmarks:x86_64 Run default x86_64 benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants