Skip to content

[SLP] Make getSameOpcode support interchangeable instructions. #133888

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 14, 2025

Conversation

HanKuanChen
Copy link
Contributor

We use the term "interchangeable instructions" to refer to different
operators that have the same meaning (e.g., add x, 0 is equivalent to
mul x, 1).
Non-constant values are not supported, as they may incur high costs with
little benefit.


Co-authored-by: Alexey Bataev a.bataev@gmx.com

HanKuanChen and others added 5 commits March 31, 2025 22:50
…132887)

We use the term "interchangeable instructions" to refer to different
operators that have the same meaning (e.g., `add x, 0` is equivalent to
`mul x, 1`).
Non-constant values are not supported, as they may incur high costs with
little benefit.

---------

Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
Copy link

github-actions bot commented Apr 1, 2025

⚠️ undef deprecator found issues in your code. ⚠️

You can test this locally with the following command:
git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 'HEAD~1' HEAD llvm/test/Transforms/SLPVectorizer/X86/BinOpSameOpcodeHelper.ll llvm/test/Transforms/SLPVectorizer/isOpcodeOrAlt.ll llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp llvm/test/Transforms/SLPVectorizer/AArch64/vec3-base.ll llvm/test/Transforms/SLPVectorizer/RISCV/reversed-strided-node-with-external-ptr.ll llvm/test/Transforms/SLPVectorizer/RISCV/vec3-base.ll llvm/test/Transforms/SLPVectorizer/X86/barriercall.ll llvm/test/Transforms/SLPVectorizer/X86/bottom-to-top-reorder.ll llvm/test/Transforms/SLPVectorizer/X86/buildvector-postpone-for-dependency.ll llvm/test/Transforms/SLPVectorizer/X86/bv-shuffle-mask.ll llvm/test/Transforms/SLPVectorizer/X86/extract-scalar-from-undef.ll llvm/test/Transforms/SLPVectorizer/X86/extractcost.ll llvm/test/Transforms/SLPVectorizer/X86/gathered-delayed-nodes-with-reused-user.ll llvm/test/Transforms/SLPVectorizer/X86/minbitwidth-drop-wrapping-flags.ll llvm/test/Transforms/SLPVectorizer/X86/multi-extracts-bv-combined.ll llvm/test/Transforms/SLPVectorizer/X86/non-scheduled-inst-reused-as-last-inst.ll llvm/test/Transforms/SLPVectorizer/X86/propagate_ir_flags.ll llvm/test/Transforms/SLPVectorizer/X86/reduced-val-vectorized-in-transform.ll llvm/test/Transforms/SLPVectorizer/X86/reorder_diamond_match.ll llvm/test/Transforms/SLPVectorizer/X86/shuffle-mask-emission.ll llvm/test/Transforms/SLPVectorizer/X86/vec3-base.ll llvm/test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll llvm/test/Transforms/SLPVectorizer/alternate-opcode-sindle-bv.ll llvm/test/Transforms/SLPVectorizer/resized-alt-shuffle-after-minbw.ll llvm/test/Transforms/SLPVectorizer/shuffle-mask-resized.ll

The following files introduce new uses of undef:

  • llvm/test/Transforms/SLPVectorizer/X86/extract-scalar-from-undef.ll

Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields undef. You should use poison values for placeholders instead.

In tests, avoid using undef and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead.

For example, this is considered a bad practice:

define void @fn() {
  ...
  br i1 undef, ...
}

Please use the following instead:

define void @fn(i1 %cond) {
  ...
  br i1 %cond, ...
}

Please refer to the Undefined Behavior Manual for more information.

@HanKuanChen
Copy link
Contributor Author

I test the following libraries.

llvm-test-suite: O3 + -mavx2 -mf16c, O3 + -mavx10.2-512
ffmpeg: O3 + -mavx2 -mf16c, O3 + -mavx10.2-512
libjpeg-turbo: O3 + -mavx2 -mf16c, O3 + -mavx10.2-512
eigen: O3 + -mavx, O3 + -mavx512f -mfma

@alexey-bataev alexey-bataev requested a review from hiraditya April 1, 2025 13:13
Copy link
Contributor

@gbossu gbossu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I brushed through the code, looks sensible. But I don't know too much about the SLPVectorizer yet, so do what you want with my comments 😉

@HanKuanChen
Copy link
Contributor Author

ping

Comment on lines +606 to +623
static bool allSameOpcode(ArrayRef<Value *> VL) {
auto *It = find_if(VL, IsaPred<Instruction>);
if (It == VL.end())
return true;
Instruction *MainOp = cast<Instruction>(*It);
unsigned Opcode = MainOp->getOpcode();
bool IsCmpOp = isa<CmpInst>(MainOp);
CmpInst::Predicate BasePred = IsCmpOp ? cast<CmpInst>(MainOp)->getPredicate()
: CmpInst::BAD_ICMP_PREDICATE;
return std::all_of(It, VL.end(), [&](Value *V) {
if (auto *CI = dyn_cast<CmpInst>(V))
return BasePred == CI->getPredicate();
if (auto *I = dyn_cast<Instruction>(V))
return I->getOpcode() == Opcode;
return isa<PoisonValue>(V);
});
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a function isAlternateInstruction, can we use it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot.
isAlternateInstruction consider interchangeable instruction.
allSameOpcode does not consider interchangeable instruction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a parameter to isAlternateInstruction to skip interchangeable instructions to avoid code duplication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot implement allSameOpcode by isAlternateInstruction.

The MainOp and AltOp of isAlternateInstruction come from getSameOpcode, but getSameOpcode already takes interchangeable instructions into account.

However, the current allSameOpcode "refinds" MainOp and AltOp without considering alternate or interchangeable instructions.

@HanKuanChen HanKuanChen merged commit 123993f into llvm:main Apr 14, 2025
10 of 11 checks passed
@HanKuanChen HanKuanChen deleted the slp-interchangeable-instruction branch April 14, 2025 11:23
@mikaelholmen
Copy link
Collaborator

@HanKuanChen and @alexey-bataev :
The following starts crashing with this patch:

opt -passes=slp-vectorizer bbi-106161.ll -o /dev/null

Result:

opt: ../include/llvm/ADT/APInt.h:1331: void llvm::APInt::setBit(unsigned int): Assertion `BitPosition < BitWidth && "BitPosition out of range"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: build-all/bin/opt -passes=slp-vectorizer bbi-106161.ll -o /dev/null
1.	Running pass "function(slp-vectorizer)" on module "bbi-106161.ll"
2.	Running pass "slp-vectorizer" on function "f_768_3162"
 #0 0x0000560032c2ed06 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build-all/bin/opt+0x45e9d06)
 #1 0x0000560032c2c74e llvm::sys::RunSignalHandlers() (build-all/bin/opt+0x45e774e)
 #2 0x0000560032c2f589 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007f9d0a3c8d10 __restore_rt (/lib64/libpthread.so.0+0x12d10)
 #4 0x00007f9d07d6852f raise (/lib64/libc.so.6+0x4e52f)
 #5 0x00007f9d07d3be65 abort (/lib64/libc.so.6+0x21e65)
 #6 0x00007f9d07d3bd39 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21d39)
 #7 0x00007f9d07d60e86 (/lib64/libc.so.6+0x46e86)
 #8 0x000056003469d911 llvm::slpvectorizer::BoUpSLP::VLOperands::VLOperands(llvm::ArrayRef<llvm::Value*>, (anonymous namespace)::InstructionsState const&, llvm::slpvectorizer::BoUpSLP const&) SLPVectorizer.cpp:0:0
 #9 0x000056003469b0f1 llvm::slpvectorizer::BoUpSLP::TreeEntry::setOperand(llvm::slpvectorizer::BoUpSLP const&, bool) SLPVectorizer.cpp:0:0
#10 0x000056003468cdfd llvm::slpvectorizer::BoUpSLP::buildTree_rec(llvm::ArrayRef<llvm::Value*>, unsigned int, llvm::slpvectorizer::BoUpSLP::EdgeInfo const&, unsigned int) (build-all/bin/opt+0x6047dfd)
#11 0x00005600346f02cc llvm::SLPVectorizerPass::tryToVectorizeList(llvm::ArrayRef<llvm::Value*>, llvm::slpvectorizer::BoUpSLP&, bool) (build-all/bin/opt+0x60ab2cc)
#12 0x00005600346f1440 llvm::SLPVectorizerPass::tryToVectorize(llvm::Instruction*, llvm::slpvectorizer::BoUpSLP&) (build-all/bin/opt+0x60ac440)
#13 0x00005600346f45bc llvm::SLPVectorizerPass::vectorizeRootInstruction(llvm::PHINode*, llvm::Instruction*, llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) (build-all/bin/opt+0x60af5bc)
#14 0x00005600346e92ec llvm::SLPVectorizerPass::vectorizeChainsInBlock(llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) (build-all/bin/opt+0x60a42ec)
#15 0x00005600346e6204 llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) (build-all/bin/opt+0x60a1204)
#16 0x00005600346e5787 llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (build-all/bin/opt+0x60a0787)
#17 0x00005600340c17dd llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilderPipelines.cpp:0:0
#18 0x0000560032e653e7 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (build-all/bin/opt+0x48203e7)
#19 0x00005600340bd9ed llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilderPipelines.cpp:0:0
#20 0x0000560032e69fbe llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x4824fbe)
#21 0x00005600340b99cd llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) PassBuilderPipelines.cpp:0:0
#22 0x0000560032e640d7 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x481f0d7)
#23 0x000056003404631c llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) (build-all/bin/opt+0x5a0131c)
#24 0x0000560032bf10fe optMain (build-all/bin/opt+0x45ac0fe)
#25 0x00007f9d07d547e5 __libc_start_main (/lib64/libc.so.6+0x3a7e5)
#26 0x0000560032beebee _start (build-all/bin/opt+0x45a9bee)
Abort (core dumped)

bbi-106161.ll.gz

HanKuanChen added a commit that referenced this pull request Apr 15, 2025
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
…133888)

We use the term "interchangeable instructions" to refer to different
operators that have the same meaning (e.g., `add x, 0` is equivalent to
`mul x, 1`).
Non-constant values are not supported, as they may incur high costs with
little benefit.

---------

Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
@aeubanks
Copy link
Contributor

Bisected a libc++ strict weak ordering check failure to this commit on the following IR:

$ cat /tmp/a.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

define i64 @foo(i64 %arg, i64 %arg1) {
bb:
  %icmp = icmp eq i64 %arg1, 0
  br i1 %icmp, label %bb7, label %bb2

bb2:                                              ; preds = %bb2, %bb
  %phi = phi i64 [ %sub, %bb2 ], [ %arg1, %bb ]
  %phi3 = phi i64 [ %add, %bb2 ], [ %arg, %bb ]
  %phi4 = phi i64 [ %add5, %bb2 ], [ %arg, %bb ]
  %add = add i64 %phi3, 1
  %call = tail call i64 @pluto(i64 %phi3)
  %add5 = add i64 %phi4, %arg
  %sub = sub i64 %phi, %arg
  %icmp6 = icmp eq i64 %sub, 0
  br i1 %icmp6, label %bb7, label %bb2

bb7:                                              ; preds = %bb2, %bb
  %phi8 = phi i64 [ %arg, %bb ], [ %add5, %bb2 ]
  ret i64 %phi8
}

declare i64 @pluto(i64)

$ opt -p slp-vectorizer -disable-output /tmp/a.ll

it's pointing at the stable_sort here

@HanKuanChen
Copy link
Contributor Author

@aeubanks I have a new version already #135797.

@rnk
Copy link
Collaborator

rnk commented Apr 24, 2025

@aeubanks I have a new version already #135797.

OK, but can you confirm that the issue has been fixed? If you aren't confident that both issues have been fixed, it's our project policy to revert first and ask questions later:
https://llvm.org/docs/DeveloperPolicy.html#patch-reversion-policy
"As a community, we strongly value having the tip of tree in a good state while allowing rapid iterative development."

At EuroLLVM I spoke to multiple other individuals for whom the SLP vectorizer is the #1 source of toolchain instability. I have seen issues reported to SLP v developers with responses like along the lines of, this feedback is too late, we've already moved on, it will be hard to rollback. IMO these are not valid excuses for allowing tip-of-tree to be unstable. When a user reports a correctness issue in LLVM, as compiler developers, correctness is paramount, and we should take it seriously.

So, with that in mind, I would appreciate confirmation of whether the reported issue has been fixed.

@HanKuanChen
Copy link
Contributor Author

I built using commit ea0dbee, and no bugs were found.

@aeubanks
Copy link
Contributor

aeubanks commented Apr 24, 2025

I can still reproduce this at main (as of a couple minutes ago). You need -stdlib=libc++ -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_DEBUG as CXXFLAGS. Locally I have libc++ 19 installed.

@alexey-bataev
Copy link
Member

I can still reproduce this at main (as of a couple minutes ago). You need -stdlib=libc++ -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_DEBUG as CXXFLAGS. Locally I have libc++ 19 installed.

Hope f427890 will fix it

@hiraditya
Copy link
Collaborator

If possible, let's try to keep a new optimization disabled by default and let it soak for a bit (maybe a week or two?) during which more testing can be done.

@aeubanks
Copy link
Contributor

thanks for the fix @alexey-bataev, confirmed that fixes the repro

@mikaelholmen
Copy link
Collaborator

At EuroLLVM I spoke to multiple other individuals for whom the SLP vectorizer is the #1 source of toolchain instability.

@rnk: Thank you for raising this!

This is the case for us. We have an out of tree backend and we merge from upstream trunk llvm several times per day, and fairly often we see new problems with the SLP vectorizer.

When I report a problem it does get fixed quickly though so thumbs up for that, but that the issues appear in the first place is turning into a problem. Digging out a reproducer, and then converting it to something that reproduces on an in-tree target often takes an effort since the pass depends quite a bit on the target.

We have even raised the question internally if we should disable SLP vectorizer since the maintenance cost of dealing with the crashes is sticking out.

Of course bugs and problems will always occur but SLP vectorizer unfortunately stands out in a negative way for us due to the asserts we sometimes see after merging new patches.

@alexey-bataev
Copy link
Member

At EuroLLVM I spoke to multiple other individuals for whom the SLP vectorizer is the #1 source of toolchain instability.

@rnk: Thank you for raising this!

This is the case for us. We have an out of tree backend and we merge from upstream trunk llvm several times per day, and fairly often we see new problems with the SLP vectorizer.

When I report a problem it does get fixed quickly though so thumbs up for that, but that the issues appear in the first place is turning into a problem. Digging out a reproducer, and then converting it to something that reproduces on an in-tree target often takes an effort since the pass depends quite a bit on the target.

We have even raised the question internally if we should disable SLP vectorizer since the maintenance cost of dealing with the crashes is sticking out.

Of course bugs and problems will always occur but SLP vectorizer unfortunately stands out in a negative way for us due to the asserts we sometimes see after merging new patches.

I'm sorry this causes so many troubles on your end, and I appreciate your help with fixing the bugs. I hope this situation will be improved soon. There is just one big functional change expected, all other changes will be mostly related to redesign and fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants