-
Notifications
You must be signed in to change notification settings - Fork 746
bump LLVM to acc6f3e9c1af6c7445aae6f10d4b016ac84112d3 #15296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bump LLVM to acc6f3e9c1af6c7445aae6f10d4b016ac84112d3 #15296
Conversation
be53396
to
677d373
Compare
The e2e test compilation errors observed on CI here reproduce locally when enabling the CUDA backend, and bisection points to:
My bisection test command:
|
The errors look like this:
|
So... the CUDA backend generates GetElementPtr with index and needs to generate instead GetElementPtr without index when the base pointer is null? |
@bjacob typically it is easier to land integrate PRs as a branch in main repo so that we can easily divide and conquer. |
This is now hitting more issues with LLVM lowering. llvm/llvm-project#70160 (comment) |
The problematic GEP that is triggering the verifier can be found here: It explicitly uses Could you whether the following change in the above code unblocks this PR? Value shiftedPtr = builder.create<LLVM::GEPOp>(
loc, globalPtr.getType(),
- LLVM::LLVMPointerType::get(globalOp.getContext()), globalPtr,
+ globalOp.getGlobalType(), globalPtr,
ValueRange({zero, offsetValue})); This uses This would also fix llvm/llvm-project#70160 @MaheshRavishankar |
Just tried, and yes this does fix the reproducer that I was using for bisection in #15296 (comment) I'll retrigger CI with that now. |
@MaheshRavishankar it's looking green this time! |
This looks fine to me. But the regressions look real. Maybe the benchmarks can be retriggered to make sure this isnt noise. |
74268c0
to
185ad95
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe since #12592 IREE should be using/is using opaque pointers exclusively. Typed pointers are currently in the process of being removed and we'd like to start removing them from the LLVM Dialect soon now that all of upstream has been converted as well.
I put some suggestions as to how this should be written with opaque pointers.
See also https://discourse.llvm.org/t/psa-in-tree-conversion-passes-can-now-be-used-with-llvm-opaque-pointers-please-switch-your-downstream-projects/68738 if you're interested in how opaque pointers differ
Value globalPtr = builder.create<LLVM::AddressOfOp>( | ||
loc, | ||
LLVM::LLVMPointerType::get(globalOp.getType(), globalOp.getAddrSpace()), | ||
globalOp.getSymName()); | ||
Value ordinalValue = builder.create<LLVM::LoadOp>(loc, globalPtr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Value globalPtr = builder.create<LLVM::AddressOfOp>( | |
loc, | |
LLVM::LLVMPointerType::get(globalOp.getType(), globalOp.getAddrSpace()), | |
globalOp.getSymName()); | |
Value ordinalValue = builder.create<LLVM::LoadOp>(loc, globalPtr); | |
Value globalPtr = builder.create<LLVM::AddressOfOp>( | |
loc, globalOp.getSymName()); | |
Value ordinalValue = builder.create<LLVM::LoadOp>(loc, globalOp.getType(), globalPtr); |
Thanks so much @zero9178 ! Appreciate the feedback.... Some of the opaque pointer semantics has been flushed from my cache, so your input is deeply appreciated. |
The performance regression is real and I can reproduce it locally on a totally different CPU (AMD Ryzen 9 7950X3D, Zen4). Reproduction command line (once the models have been imported in the build dir):
Before: 1.63 ms I will finish bisecting this tomorrow (near EOD here). |
|
Thanks... do you mind giving me a pointer to the |
https://github.com/openxla/iree/actions/runs/6657301626#summary-18094049071 shows the link to get mlir file In this case Or just |
Thanks @pzread , that's super helpful. I didn't know a better way than to build locally e2e test artifacts, which is not hard to do once you have the python venv set up with the right TensorFlow nightly packages, but easy to get wrong. Much better to be able to just download the same mlir files as were used on CI. |
Figured the performance regression. It was confusing because it was not a regression in upstream, instead it came from dropping a cherry-pick:
Looks like we have to carry this cherry-pick forward until the regression is fixed upstream? |
Actually, the cherry-pick is reverting a PR, llvm/llvm-project#68304 , that introduces a new pattern in TOSAToLinalg. Instead of reverting, we could maybe just make this new pattern optional, and opt out of it on our side. I'll give that a try. FYI: @FranklandJack (author) @NatashaKnk (reviewer) |
OK, this PR was flipping the target kernel layout for conv2D ops imported from TOSA into Linalg named ops: they used to be imported into There's nothing wrong with that --- but it's not surprising that flipping the data layout is causing this to fall out of the currently optimized path in current downstreams. So I think the path forward is to make this pass / populatePatterns function take a parameter allowing downstreams to express a preference for HWCF or FHWC filter. I'll do that now. |
Thanks. Fine to fix forward. We had that revert because it was crashing. Looks like someone fixed the crash but didn't verify. |
Hey, sorry about the breakage. FWIW I have a PR open upstream to revert the original behaviour via a transform op: llvm/llvm-project#68567. The idea was that targets could easily opt into this transform to get the original mapping back. |
No problem at all @FranklandJack . Your changes were perfectly reasonable. It's just the life of big upstreams and big downstreams. |
Thanks folks. I had noticed that the patch has been dropped in the last integrate but the notes had said it was for a crash, not a perf regression. Seems it was both. I had tracked the fix to the crash and thought it was fine that kunwar dropped it. |
llvm/llvm-project#70482 sent for review upstream... |
… ops. (#70482) Switching to FHWC happened in #68304 and is fine in itself but caused downstream performance regression iree-org/iree#15296 (comment) , so this PR makes this optional.
185ad95
to
8fb298d
Compare
Thanks! I ended up following your recommendation. The reason why I initially hesitated is that we have two lit tests that match typed pointers, and were being broken by this integrate. I initially thought (and we discussed today with @MaheshRavishankar ), let's not increase the scope of this integrate in ways that requires changing there tests, let's defer to a follow-up. But then I read the Discourse you linked to and, I don't want this integrate to move us farther out of the direction that upstream is moving to. Apparently the neutral stance that I was reaching for simply doesn't exist, this integrate has to either move closer or farther from upstream on this. So I followed you suggestion , and that's why this revised PR now also updates these .mlir tests, @MaheshRavishankar . Other changes in this PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! The LLVM Dialect usage look good to me. I am not really qualified to review any other parts of the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great push @bjacob
0a9c77d
to
365520d
Compare
Pushed a fix for decompose_softmax.mlir. Next issue this is running into: https://github.com/openxla/iree/actions/runs/6671721799/job/18134792899?pr=15296 Local repro:
More GEP stuff... |
OK, got a fix for this one, but i'd like to check how to make it correct. The question is how to decide, based on a diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp b/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
index fffcd3ba8..ec197ee8a 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
@@ -72,12 +72,19 @@ void ConvertToDynamicSharedMemory(ModuleOp moduleOp) {
Value zero = builder.create<LLVM::ConstantOp>(
loc, IntegerType::get(builder.getContext(), 64),
builder.getI64IntegerAttr(0));
- Value offsetValue = builder.create<LLVM::ConstantOp>(
- loc, IntegerType::get(builder.getContext(), 64),
- builder.getI64IntegerAttr(offset));
- Value shiftedPtr = builder.create<LLVM::GEPOp>(
- loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
- ValueRange({zero, offsetValue}));
+ Value shiftedPtr;
+ if (globalOp.getGlobalType().isF32()) {
+ shiftedPtr = builder.create<LLVM::GEPOp>(
+ loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
+ ValueRange({zero}));
+ } else {
+ Value offsetValue = builder.create<LLVM::ConstantOp>(
+ loc, IntegerType::get(builder.getContext(), 64),
+ builder.getI64IntegerAttr(offset));
+ shiftedPtr = builder.create<LLVM::GEPOp>(
+ loc, globalPtr.getType(), globalOp.getGlobalType(), globalPtr,
+ ValueRange({zero, offsetValue}));
+ }
addressOfOp.replaceAllUsesWith(shiftedPtr);
addressOfOp.erase();
} |
Check for
|
I believe I made a mistake in the original suggestion. It should be Sorry for the inconvenience. |
Ah, thanks a lot @zero9178 ! Hopefully that will also take care of the illegal address errors on CUDA CI: |
Funny, I had noticed how the existing code had two separate variables named |
So that does seem to work locally and remove the need for the |
llvm/llvm-project#70218 just missed the last integrate, and cherry-picks are frowned upon. The good thing with just missing an integrate is that just bumping the submodule shouldn't be too hard still. I just had to fix up one small thing in CollapseDimensions. ci-extra:build_test_all_windows,build_test_all_macos_arm64,build_test_all_macos_x86_64
llvm/llvm-project#70218 just missed the last integrate, and cherry-picks are frowned upon.
The good thing with just missing an integrate is that just bumping the submodule shouldn't be too hard still. I just had to fix up one small thing in CollapseDimensions.
ci-extra:build_test_all_windows,build_test_all_macos_arm64,build_test_all_macos_x86_64