[LLVM] Add IRNormalizer Pass #68176

justinfargnoli · 2023-10-04T04:30:04Z

Add the llvm-canon tool. Description from the original PR:

Added a new llvm-canon tool which aims to transform LLVM Modules into a canonical form by reordering and renaming instructions while preserving the same semantics. This tool makes it easier to spot semantic differences while diffing two modules which have undergone different transformation passes.

The current version of this tool can:

Reorder instructions within a function.
Rename instructions based on the operands.
Sort commutative operands.

This code was originally written by @michalpaszkowski and submitted to mainline LLVM. However, it was quickly reverted to do BuildBot errors.

Michal presented his version of the tool in LLVM-Canon: Shooting for Clear Diffs.

@AidanGoldfarb and I ported the code to the new pass manager, added more tests, and fixed some bugs related to PHI nodes that may have been the root cause of the BuildBot errors that caused the patch to be reverted. Additionally, we rewrote the implementation of instruction reordering to fix cases where the original algorithm would break use-def chains.

Note that this is @AidanGoldfarb and I's first time submitting to LLVM. Please liberally critique the PR!

CC @plotfi for initial review.

github-actions · 2023-10-04T04:43:02Z

✅ With the latest revision this PR passed the C/C++ code formatter.

michalpaszkowski · 2023-10-04T04:50:56Z

Thank you @justinfargnoli and @AidanGoldfarb for taking over this work and reaching out! I will review your changes in the coming week.

CC @ChrisCummins FYI

llvm/lib/Transforms/Utils/IRCanonicalizer.cpp

AaronBallman · 2023-10-05T11:33:38Z

Added @plotfi as a reviewer; one drive-by question is whether adding a new LLVM tool like this requires an RFC to be posted to Discourse or not (https://llvm.org/docs/DeveloperPolicy.html#introducing-new-components-into-llvm). (I don't have any opinions on the tool; just trying to help the review move along.)

plotfi · 2023-10-05T14:34:35Z

Thanks for reaching out guys.

I recall approving the original PR on phab. I can re-review if needed. This work was approved to land before, so personally I think it might be overkill to RFC it on discourse again but I am not certain of the right process to take.

Added @plotfi as a reviewer; one drive-by question is whether adding a new LLVM tool like this requires an RFC to be posted to Discourse or not (https://llvm.org/docs/DeveloperPolicy.html#introducing-new-components-into-llvm). (I don't have any opinions on the tool; just trying to help the review move along.)

nikic · 2023-10-05T14:46:36Z

Please don't use the term canon/canonicalize for this pass or tool. LLVM has an existing notion of "canonicalization" which does not coincide with what is being done here.

AaronBallman · 2023-10-05T15:14:45Z

Thanks for reaching out guys.

I recall approving the original PR on phab. I can re-review if needed. This work was approved to land before, so personally I think it might be overkill to RFC it on discourse again but I am not certain of the right process to take.

Added @plotfi as a reviewer; one drive-by question is whether adding a new LLVM tool like this requires an RFC to be posted to Discourse or not (https://llvm.org/docs/DeveloperPolicy.html#introducing-new-components-into-llvm). (I don't have any opinions on the tool; just trying to help the review move along.)

I don't have strong opinions on whether this should have an RFC or not, but the original review didn't come with an RFC that I could find either. Given that there are some concerns about what the name of the tool is, I weakly lean towards running an RFC just to make sure everyone is comfortable with the tool. But again, I don't insist, it's just a drive-by comment because I happened to notice the PR.

plotfi · 2023-10-09T04:29:46Z

Please don't use the term canon/canonicalize for this pass or tool. LLVM has an existing notion of "canonicalization" which does not coincide with what is being done here.

The term comes from an older pass that did similar algorithms on MIR. They are only borrowing the term from it.

michalpaszkowski · 2023-10-09T04:43:39Z

Please don't use the term canon/canonicalize for this pass or tool. LLVM has an existing notion of "canonicalization" which does not coincide with what is being done here.

@nikic I don't think the name "canon" or "canonicalizer" will lead to any confusion here. This is in fact what the pass is doing and mimics the name of MIR canonicalizer with similar assumptions as @plotfi noted . Using the term "canonicalization" for the current/existing set of passes is not really well defined.

As for the RFC, there was a discussion thread in the LLVM Dev Mailing: https://lists.llvm.org/pipermail/llvm-dev/2019-August/134475.html

nikic · 2023-10-09T09:22:52Z

Using the term for MIR is okay, because it does not have a pre-existing notion of canonical form. Doing the same for IR is not okay, because it already has a canonical form and this pass does not produce it. Calling this ir-normalizer instead of ir-canonicalizer should convey about the same intent without overloading an existing, widely used term.

PS: The PR title and patch description could use an update, as this does not introduce an actual llvm-canon tool, only a normal opt pass (which is the right thing to do, but doesn't match the description).

justinfargnoli · 2023-10-14T22:09:03Z

Renamed to IRNormalizer. Requesting another round of review.

llvm/docs/Passes.rst

llvm/lib/Transforms/Utils/IRNormalizer.cpp

llvm/docs/Passes.rst

llvm/docs/ReleaseNotes.rst

llvm/lib/Transforms/Utils/IRNormalizer.cpp

michalpaszkowski · 2023-10-18T07:24:08Z

I added a couple more comments regarding the name change and formatting of the docs, but apart from these the patch looks good to me.

justinfargnoli · 2023-10-31T17:08:33Z

Ping for another round of review.

michalpaszkowski

Thank you for working on this! LGTM!

nikic

Could you please update the patch description with some information on what kind of normalization the pass does?

It would also be great to post some examples, because I don't really get what kind of renaming and reordering this does just looking at the test coverage.

One bit that's particularly non-obvious to me is why this normalization has to involve hashes? That seems like it would make things more unstable and hard to understand.

I would recommend trying to run this pass (with all options enabled) over all existing tests to make sure that it doesn't cause any crashes/verifier failures. You're moving instructions around, and it's easy to get that wrong when invoke, callbr or catchswitch are involved.

I would also try to add this pass at the end of the clang pipeline and run llvm-test-suite to verify that the normalization this does is indeed semantics-preserving.

(Review notes are just nits, I haven't tried to understand the main logic yet.)

llvm/docs/Passes.rst

llvm/lib/Transforms/Utils/IRNormalizer.cpp

michalpaszkowski · 2023-11-11T03:43:39Z

Here are the slides that were presented at the LLVM Dev Meeting in 2019. It might be useful to include the link in the commit message for reference.

@nikic Thanks for catching more things in the review! Especially the commutative ordering of operands/arguments.

When it comes to including hashes in the naming algorithm (renaming is controlled using an option for already named values), the ideas is that by just looking at an instruction further down (a use) you can tell whether any inputs are different. And the other way around, by looking at initial instructions you can see how their "output footprint" has changed after transformations.

The example below is a bit outdated, but reflects the general idea we had when considering naming algorithms:

justinfargnoli · 2023-11-20T06:34:46Z

I would recommend trying to run this pass (with all options enabled) over all existing tests to make sure that it doesn't cause any crashes/verifier failures. You're moving instructions around, and it's easy to get that wrong when invoke, callbr or catchswitch are involved.

Thanks for this suggestion! It helped iron out a lot of bugs in the implementation.

I would also try to add this pass at the end of the clang pipeline and run llvm-test-suite to verify that the normalization this does is indeed semantics-preserving.

I'm running into some linking errors when trying to build llvm-test-suite. Pinging for another round of review in the meantime.

justinfargnoli · 2024-09-08T04:38:14Z

Hi @plotfi, thanks for pinging me on this! Generally speaking, two things deserve additional focus:

UX testing to find:
- Improvements to existing features
- New features to add
Additional testing to ensure that the pass preserves the semantics of the IR.
- e.g. I want to add this pass to the nvcc optimization pipeline and run it on NVIDIA's internal test suite.
- If Meta has similar test suites, it would be great to see if this pass introduces any functional issues on it.

I cannot reproduce these failures on my x86 machine.

Big thanks to @dtcxzyw for running the test-suite. Since @dtcxzyw did not find any test failures, this MR is ready for another round of review.

If things look good to @plotfi and @nikic, then we should be good to get this merged!

justinfargnoli · 2024-09-15T22:21:15Z

Ping @plotfi @nikic for review.

justinfargnoli · 2024-09-18T18:06:51Z

Ping @plotfi @nikic for review.

plotfi · 2024-10-17T07:09:19Z

Did a pretty cursory look over this PR.

LGTM, I would prefer if this were landed and added improvements as needed later.

Thank you again for working on this.

vvereschaka · 2024-10-22T03:12:24Z

Looks like these failed tests on llvm-clang-x86_64-expensive-checks-ubuntu builder are related with this PR:

LLVM ERROR: Function @test changed by IRNormalizerPass without invalidating analyses
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/opt -S -passes=normalize
1.	Running pass "function(normalize)" on module "<stdin>"
2.	Running pass "normalize" on function "test"
 #0 0x00005643d9c9fdaa llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:22
...

https://lab.llvm.org/buildbot/#/builders/187/builds/2031

llvm-ci · 2024-10-22T04:45:01Z

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-debian running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/7518

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/IRNormalizer/regression-infinite-loop.ll' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 2: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt -S -passes=normalize < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Transforms/IRNormalizer/regression-infinite-loop.ll | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Transforms/IRNormalizer/regression-infinite-loop.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt -S -passes=normalize
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Transforms/IRNormalizer/regression-infinite-loop.ll
LLVM ERROR: Function @test changed by IRNormalizerPass without invalidating analyses
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt -S -passes=normalize
1.	Running pass "function(normalize)" on module "<stdin>"
2.	Running pass "normalize" on function "test"
 #0 0x00000000045c7697 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt+0x45c7697)
 #1 0x00000000045c514e llvm::sys::RunSignalHandlers() (/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt+0x45c514e)
 #2 0x00000000045c7d4f SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f95c2d96140 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x13140)
 #4 0x00007f95c28aad51 raise (/lib/x86_64-linux-gnu/libc.so.6+0x38d51)
 #5 0x00007f95c2894537 abort (/lib/x86_64-linux-gnu/libc.so.6+0x22537)
 #6 0x00000000045290aa llvm::report_fatal_error(llvm::Twine const&, bool) (/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt+0x45290aa)
 #7 0x00000000028458da void llvm::detail::UniqueFunctionBase<void, llvm::StringRef, llvm::Any, llvm::PreservedAnalyses const&>::CallImpl<llvm::PreservedCFGCheckerInstrumentation::registerCallbacks(llvm::PassInstrumentationCallbacks&, llvm::AnalysisManager<llvm::Module>&)::$_18>(void*, llvm::StringRef, llvm::Any&, llvm::PreservedAnalyses const&) StandardInstrumentations.cpp:0:0
 #8 0x00000000043f760e llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt+0x43f760e)
 #9 0x0000000000d0b26d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) crtstuff.c:0:0
#10 0x00000000043fb840 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt+0x43fb840)
#11 0x0000000000d0b03d llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) crtstuff.c:0:0
#12 0x00000000043f6407 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt+0x43f6407)
#13 0x000000000085cf50 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) (/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt+0x85cf50)
#14 0x0000000000852635 optMain (/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt+0x852635)
#15 0x00007f95c2895d7a __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d7a)
#16 0x000000000084e99a _start (/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt+0x84e99a)
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Transforms/IRNormalizer/regression-infinite-loop.ll

--

********************

justinfargnoli · 2024-10-22T18:02:11Z

@vvereschaka, I agree.

Note: The problem is that IRNormalizer produces a change that invalidates an analysis. It's not that IRNormalizer should declare that it's not preserving an analysis.

I likely won't be able to fix this for a few days. @vvereschaka, should I revert 1295d2e in the meantime?

vvereschaka · 2024-10-22T18:09:38Z

Reverting the changes is the optimal way in those cases. Yes, would you revert this PR. Thank you.

This reverts commit 1295d2e.

Reverts #68176 Introduced BuildBot failure: #68176 (comment)

@michalpaszkowski

`IRNormalizer` will reorder instructions. Thus, we need to invalidate analyses. Done in cd500d2. This should resolve the [BuildBot failure](#68176 (comment)). --- Original PR: #68176 Original commit: 1295d2e Reverted with: 8a12e01 --- Add the llvm-canon tool. Description from the [original PR](https://reviews.llvm.org/D66029#change-wZv3yOpDdxIu): > Added a new llvm-canon tool which aims to transform LLVM Modules into a canonical form by reordering and renaming instructions while preserving the same semantics. This tool makes it easier to spot semantic differences while diffing two modules which have undergone different transformation passes. The current version of this tool can: - Reorder instructions within a function. - Rename instructions based on the operands. - Sort commutative operands. This code was originally written by @michalpaszkowski and [submitted to mainline LLVM](14d3585). However, it was quickly [reverted](335de55) to do BuildBot errors. Michal presented his version of the tool in [LLVM-Canon: Shooting for Clear Diffs](https://www.youtube.com/watch?v=c9WMijSOEUg). @AidanGoldfarb and I ported the code to the new pass manager, added more tests, and fixed some bugs related to PHI nodes that may have been the root cause of the BuildBot errors that caused the patch to be reverted. Additionally, we rewrote the implementation of instruction reordering to fix cases where the original algorithm would break use-def chains. Note that this is @AidanGoldfarb and I's first time submitting to LLVM. Please liberally critique the PR! CC @plotfi for initial review. --------- Co-authored-by: Aidan <aidan.goldfarb@mail.mcgill.ca>

michalpaszkowski self-requested a review October 4, 2023 04:32

michalpaszkowski reviewed Oct 4, 2023

View reviewed changes

llvm/lib/Transforms/Utils/IRCanonicalizer.cpp Outdated Show resolved Hide resolved

AaronBallman requested a review from plotfi October 5, 2023 11:31

justinfargnoli changed the title ~~llvm-canon~~ [LLVM] Add Normalizer Pass Oct 14, 2023

justinfargnoli changed the title ~~[LLVM] Add Normalizer Pass~~ [LLVM] Add IRNormalizer Pass Oct 14, 2023

justinfargnoli requested a review from michalpaszkowski October 14, 2023 22:08