Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] NDK r25c clang generates invalid code/crashes when optimizing for size (-Os) #1862

Closed
SanjaLV opened this issue Apr 11, 2023 · 12 comments
Assignees
Labels

Comments

@SanjaLV
Copy link

SanjaLV commented Apr 11, 2023

Description

Android NDK r25c produces invalid code (and crashes if we mark certain functions as noinline) for arm64-v8a target architecture when compiling with -Os (optimize size) compiler flag.

Bellow is attached minimized/striped sample that shows the problem (it uses functions from libtomcrypt).

Link to the repo repository: https://github.com/SanjaLV/NDK_r25c_repro

Prerequisites:

  1. Linux/macOS machine
  2. ANDROID_HOME env variable that will point to Android SDK root.
  3. "ndk;25.2.9519653" / "ndk;25.1.8937393" installed in sdkmanager
  4. System clang compiler with UBSAN/ASAN.
  5. arm64-v8a Android device/emulator (emulator was tested only on Apple M1 machine) connected to ADB

How to reproduce (invalid code):

  1. Run test_manual.sh
  2. Observe that REMOTE_0s_LOG.txt differs from REMOVE_02_LOG.txt

How to reproduce (compiler crash):

  1. Open REPRO.c in the editor of your choice
  2. Change define on line 10 to #define MAKE_COMPILER_CRASH 1
  3. Run test_manual.sh
  4. Observe that clang will crash trying to compile REPRO.c with -0s

Crash backtrace should look like:

Program received signal SIGSEGV, Segmentation fault.
0x00000000065bec3e in llvm::VPTransformState::get(llvm::VPValue*, llvm::VPIteration const&) ()
(gdb) bt
#0  0x00000000065bec3e in llvm::VPTransformState::get(llvm::VPValue*, llvm::VPIteration const&) ()
#1  0x00000000065be8b9 in llvm::InnerLoopVectorizer::scalarizeInstruction(llvm::Instruction*, llvm::VPReplicateRecipe*, llvm::VPIteration const&, bool, llvm::VPTransformState&) ()
#2  0x00000000065be760 in llvm::VPReplicateRecipe::execute(llvm::VPTransformState&) ()
#3  0x00000000065be31e in llvm::VPBasicBlock::execute(llvm::VPTransformState*) ()
#4  0x00000000065be047 in llvm::VPRegionBlock::execute(llvm::VPTransformState*) ()
#5  0x00000000065bdf6c in llvm::VPRegionBlock::execute(llvm::VPTransformState*) ()
#6  0x00000000066fabf1 in llvm::VPlan::execute(llvm::VPTransformState*) ()
#7  0x00000000062fd91e in llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*) ()
#8  0x000000000663a48f in llvm::LoopVectorizePass::processLoop(llvm::Loop*) ()
#9  0x0000000005edebea in llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::__1::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*)
    ()
#10 0x0000000005eddbb9 in llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) ()
#11 0x0000000005edd89b in ?? ()
#12 0x0000000005c6168a in llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) ()
#13 0x0000000005c61521 in clang::TemplateDeclInstantiator::VisitDecl(clang::Decl*) ()
#14 0x0000000005ec7112 in llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) ()
#15 0x0000000005ec6dd1 in ?? ()
#16 0x00000000063538c6 in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) ()
#17 0x00000000065d66c8 in clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) ()
#18 0x00000000060524d5 in ?? ()
#19 0x0000000005ea25a9 in clang::ParseAST(clang::Sema&, bool, bool) ()
#20 0x00000000063c128d in clang::FrontendAction::Execute() ()
#21 0x00000000063c112d in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) ()
#22 0x00000000063c1541 in clang::ExecuteCompilerInvocation(clang::CompilerInstance*) ()
#23 0x00000000066a9f54 in cc1_main(llvm::ArrayRef<char const*>, char const*, void*) ()
#24 0x00000000066a6de3 in ?? ()
#25 0x00000000066754a5 in main ()

Context:

Originally discovered that upgrading NDK from version r25b to r25c changes return values of certain cryptographic functions. After some investigations, we found that the first function where the return value changes with NDK r25c was mp_montgomery_reduce. Then we minimized the code by fixing the input argument to mp_montgomery_reduce. (These values are not unique any valid random values will work too, as long as P is odd).
We tried to compare generated assembly, but clangs inlines the majority of the calls, so that complicates the investigation, thus we tried to apply noinline attribute. Which resulted in a compiler crash during Loop vectorization.


Comparing clang_source_info.md of both NDK, I can spot a few patches regarding arm64 vectorization:

- [[AArch64] Use simd mov to materialize big fp constants](https://android.googlesource.com/toolchain/llvm_android/+/91fdeab43d29b1f228113859da8ee238bc8c2f16/patches/cherry/7a605ab7bfbc681c34335684f45b7da32d495db1.patch)
- [[AArch64] Emit vector FP cmp when LE is used with fast-math](https://android.googlesource.com/toolchain/llvm_android/+/91fdeab43d29b1f228113859da8ee238bc8c2f16/patches/cherry/bf268a05cd9294854ffccc3158c0e673069bed4a.patch)
- [Loop-Vectorizer-shouldMaximizeVectorBandwidth.patch](https://android.googlesource.com/toolchain/llvm_android/+/91fdeab43d29b1f228113859da8ee238bc8c2f16/patches/Loop-Vectorizer-shouldMaximizeVectorBandwidth.patch)

I don't have access to patches, thus cannot verify this hypothesis. Clang with debug asserts enabled might also provide additional information, but I don't know how to build NDK clang.

Feel free to ask for more information.

Many thanks,
Aleksandrs

Affected versions

r25

Canary version

No response

Host OS

Linux, Mac

Host OS version

Ubuntu 22.04

Affected ABIs

arm64-v8a

Build system

ndk-build

Other build system

No response

minSdkVersion

31 (not relevant)

Device API level

27

@SanjaLV SanjaLV added the bug label Apr 11, 2023
@SanjaLV
Copy link
Author

SanjaLV commented Apr 11, 2023

I managed to creduce compiler crash to get following example:

int a, c, d;
long b[1];
void e() {
  int f = c;
  int *g;
  long *h;
  g = &a;
  h = b;
  for (; d; d++)
    *h++ = f * (long)*g++;
}
int main() {}

Slight modifying the testing script https://github.com/SanjaLV/NDK_r25c_repro/blob/master/test_crash.sh

@github-project-automation github-project-automation bot moved this to Triaged in NDK r26 Apr 11, 2023
@SanjaLV SanjaLV changed the title [BUG] NDK r25c clang generates invalid code/crashes when optimizing for size (-0s) [BUG] NDK r25c clang generates invalid code/crashes when optimizing for size (-Os) Apr 13, 2023
@appujee
Copy link
Collaborator

appujee commented Apr 17, 2023

I'm able to repro this bug with:

clang-r450784d1/bin/clang -std=c99 -o remote-0s -Os -g -fPIC ~/g/bug-vect.c  --target=aarch64-linux-android31  -c

Stack dump:
0.      Program arguments: /path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14 -cc1 -triple aarch64-unknown-linux-android31 -emit-obj --mrelax-relocations -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name bug-vect.c -mrelocation-model pic -pic-level 2 -fhalf-no-semantic-interposition -mframe-pointer=non-leaf -ffp-contract=on -fno-rounding-math -mconstructor-aliases -funwind-tables=2 -target-cpu generic -target-feature +neon -target-feature +v8a -target-feature +fix-cortex-a53-835769 -target-abi aapcs -fallow-half-arguments-and-returns -mllvm -treat-scalable-fixed-error-as-warning -debug-info-kind=constructor -dwarf-version=4 -debugger-tuning=gdb -fcoverage-compilation-dir=/path/prebuilts/clang/host/linux-x86/clang-r450784d1 -resource-dir /path/prebuilts/clang/host/linux-x86/clang-r450784d1/lib64/clang/14.0.7 -internal-isystem /path/prebuilts/clang/host/linux-x86/clang-r450784d1/lib64/clang/14.0.7/include -internal-isystem /usr/local/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -Os -std=c99 -fdebug-compilation-dir=/path/prebuilts/clang/host/linux-x86/clang-r450784d1 -ferror-limit 19 -fno-signed-char -fgnuc-version=4.2.1 -vectorize-loops -vectorize-slp -target-feature +outline-atomics -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o /tmp/bug-vect-5931b9.o -x c bug-vect.c
1.      <eof> parser at end of file
2.      Optimizer
 #0 0x00000000047d91d8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14+0x47d91d8)
 #1 0x00000000047d8340 llvm::sys::RunSignalHandlers() (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14+0x47d8340)
 #2 0x00000000047d94ca (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14+0x47d94ca)
 #3 0x00007fe642fe7f90 (/lib/x86_64-linux-gnu/libc.so.6+0x3bf90)
 #4 0x00000000065bec3e llvm::VPTransformState::get(llvm::VPValue*, llvm::VPIteration const&) (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14+0x65bec3e)
 #5 0x00000000065be8b9 llvm::InnerLoopVectorizer::scalarizeInstruction(llvm::Instruction*, llvm::VPReplicateRecipe*, llvm::VPIteration const&, bool, llvm::VPTransformState&) (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14+0x65be8b9)
 #6 0x00000000065be760 llvm::VPReplicateRecipe::execute(llvm::VPTransformState&) (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14+0x65be760)
 #7 0x00000000065be31e llvm::VPBasicBlock::execute(llvm::VPTransformState*) (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14+0x65be31e)
 #8 0x00000000065be047 llvm::VPRegionBlock::execute(llvm::VPTransformState*) (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14+0x65be047)
 #9 0x00000000065bdf6c llvm::VPRegionBlock::execute(llvm::VPTransformState*) (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/
clang-14+0x65bdf6c)
#10 0x00000000066fabf1 llvm::VPlan::execute(llvm::VPTransformState*) (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14+0x66fabf1)
#11 0x00000000062fd91e llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*) (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14+0x62fd91e)
#12 0x000000000663a48f llvm::LoopVectorizePass::processLoop(llvm::Loop*) (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14+0x663a48f)
#13 0x0000000005edebea llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::__1::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14+0x5edebea)
#14 0x0000000005eddbb9 llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14+0x5eddbb9)
#15 0x0000000005edd89b (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14+0x5edd89b)
#16 0x0000000005c6168a llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14+0x5c6168a)
#17 0x0000000005c61521 clang::TemplateDeclInstantiator::VisitDecl(clang::Decl*) (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14+0x5c61521)
#18 0x0000000005ec7112 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14+0x5ec7112)
#19 0x0000000005ec6dd1 (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14+0x5ec6dd1)
#20 0x00000000063538c6 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14+0x63538c6)
#21 0x00000000065d66c8 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) (/path/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang-14+0x65d66c8)
...

@appujee
Copy link
Collaborator

appujee commented Apr 19, 2023

https://android-review.googlesource.com/c/toolchain/llvm_android/+/2259912 has a bunch of patches related to vector instructions. This is the likely cause of the bug. I'm still investigating the root cause.

@ilinpv
Copy link

ilinpv commented Apr 19, 2023

Confirmed, I reproduced crash with -Os. Thank you for report! I will look for the reason.

@appujee
Copy link
Collaborator

appujee commented Apr 19, 2023

Thanks for taking a look at it. I can confirm that reverting: https://android-review.googlesource.com/c/toolchain/llvm_android/+/2259912/3/patches/cherry/Loop-Vectorizer-shouldMaximizeVectorBandwidth.patch fixes the crash although it might introduce regressions I assume?

@ilinpv
Copy link

ilinpv commented Apr 19, 2023

Error caused by "Set maximum VF with shouldMaximizeVectorBandwidth" patch ( https://android-review.googlesource.com/c/toolchain/llvm_android/+/2259912/3/patches/cherry/Loop-Vectorizer-shouldMaximizeVectorBandwidth.patch ). Figuring out the fix.

@ilinpv
Copy link

ilinpv commented Apr 19, 2023

You were first 😄 Yes, it is a performance patch and removing it would cause regressions.

@DanAlbert
Copy link
Member

Does the crash still exist in newer LLVMs?

@stephenhines
Copy link
Collaborator

srhines@ringworld:__:~$ /disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang -std=c99 -Os -g -fPIC t.c --target=aarch64-linux-android31 -c
PLEASE submit a bug report to https://github.com/android-ndk/ndk/issues and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real -std=c99 -Os -g -fPIC t.c --target=aarch64-linux-android31 -c
1.	<eof> parser at end of file
2.	Optimizer
 #0 0x00000000047d91d8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x47d91d8)
 #1 0x00000000047d8340 llvm::sys::RunSignalHandlers() (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x47d8340)
 #2 0x00000000047a3dc3 (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x47a3dc3)
 #3 0x00000000047a3fa1 (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x47a3fa1)
 #4 0x00007f5a9c1fdf90 (/lib/x86_64-linux-gnu/libc.so.6+0x3bf90)
 #5 0x00000000065bec3e llvm::VPTransformState::get(llvm::VPValue*, llvm::VPIteration const&) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x65bec3e)
 #6 0x00000000065be8b9 llvm::InnerLoopVectorizer::scalarizeInstruction(llvm::Instruction*, llvm::VPReplicateRecipe*, llvm::VPIteration const&, bool, llvm::VPTransformState&) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x65be8b9)
 #7 0x00000000065be760 llvm::VPReplicateRecipe::execute(llvm::VPTransformState&) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x65be760)
 #8 0x00000000065be31e llvm::VPBasicBlock::execute(llvm::VPTransformState*) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x65be31e)
 #9 0x00000000065be047 llvm::VPRegionBlock::execute(llvm::VPTransformState*) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x65be047)
#10 0x00000000065bdf6c llvm::VPRegionBlock::execute(llvm::VPTransformState*) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x65bdf6c)
#11 0x00000000066fabf1 llvm::VPlan::execute(llvm::VPTransformState*) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x66fabf1)
#12 0x00000000062fd91e llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x62fd91e)
#13 0x000000000663a48f llvm::LoopVectorizePass::processLoop(llvm::Loop*) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x663a48f)
#14 0x0000000005edebea llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::__1::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x5edebea)
#15 0x0000000005eddbb9 llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x5eddbb9)
#16 0x0000000005edd89b (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x5edd89b)
#17 0x0000000005c6168a llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x5c6168a)
#18 0x0000000005c61521 clang::TemplateDeclInstantiator::VisitDecl(clang::Decl*) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x5c61521)
#19 0x0000000005ec7112 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x5ec7112)
#20 0x0000000005ec6dd1 (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x5ec6dd1)
#21 0x00000000063538c6 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x63538c6)
#22 0x00000000065d66c8 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x65d66c8)
#23 0x00000000060524d5 (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x60524d5)
#24 0x0000000005ea25a9 clang::ParseAST(clang::Sema&, bool, bool) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x5ea25a9)
#25 0x00000000063c128d clang::FrontendAction::Execute() (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x63c128d)
#26 0x00000000063c112d clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x63c112d)
#27 0x00000000063c1541 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x63c1541)
#28 0x00000000066a9f54 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x66a9f54)
#29 0x00000000066a6de3 (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x66a6de3)
#30 0x00000000066a6c92 (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x66a6c92)
#31 0x00000000066a6c61 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x66a6c61)
#32 0x00000000066a69f4 clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, bool*) const (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x66a69f4)
#33 0x00000000066a685f clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x66a685f)
#34 0x00000000066a66f2 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*> >&) (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x66a66f2)
#35 0x00000000066752ee main (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x66752ee)
#36 0x00007f5a9c1e918a __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#37 0x00007f5a9c1e9245 call_init ./csu/../csu/libc-start.c:128:20
#38 0x00007f5a9c1e9245 __libc_start_main ./csu/../csu/libc-start.c:368:5
#39 0x00000000064cce69 _start (/disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin/clang.real+0x64cce69)
clang-14: error: clang frontend command failed with exit code 139 (use -v to see invocation)
Android (9352603, based on r450784d1) clang version 14.0.7 (https://android.googlesource.com/toolchain/llvm-project 4c603efb0cca074e9238af8b4106c30add4418f6)
Target: aarch64-unknown-linux-android31
Thread model: posix
InstalledDir: /disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r450784d1/bin
clang-14: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-14: note: diagnostic msg: /tmp/t-21d0df.c
clang-14: note: diagnostic msg: /tmp/t-21d0df.sh
clang-14: note: diagnostic msg: 

********************
srhines@ringworld:__:~$ /disk/android_trees/aosp/prebuilts/clang/host/linux-x86/clang-r487747/bin/clang -std=c99 -Os -g -fPIC t.c --target=aarch64-linux-android31 -c
srhines@ringworld:__:~$ echo $?
0

It doesn't repro on newer Clang. I think that this might just be an issue with the batch of vectorization patches possibly missing something.

@DanAlbert DanAlbert moved this from Triaged to Needs prebuilt update in NDK r26 Apr 20, 2023
@DanAlbert
Copy link
Member

Okay, so we just need a new clang for r26 (I think I saw the fix for the previous blocker go in, so we're maybe just waiting for new prebuilts?), since it doesn't sound like there's going to be bandwidth for you folks to support both an r25d toolchain and r26 development. If that's wrong lmk and I'll retriage.

@jaykang10
Copy link

Hello,
I missed one patch from the cherry-pick commits for shouldMaximizeVectorBandwidth. I am sorry for that.
We need to cherry-pick below commit too.

commit 27c4c031c67ba2ec4306766bcfab7b108a2f436c (HEAD -> ndk)
Author: David Green <david.green@arm.com>
Date:   Thu Mar 31 09:19:31 2022 +0100 

    [LV] Invalidate widening decisions after maximizing vector bandwidth

The previous Uniforms[VF] is used again and it causes wrong cost of load instruction with VF 4 as below. We need to invalidate the previous one and re-calculate it.

LV: Found an estimated cost of 3000000 for VF 2 For instruction:   %2 = load i32, i32* %g.07, align 4, !tbaa !10
LV: Found an estimated cost of 1 for VF 4 For instruction:   %2 = load i32, i32* %g.07, align 4, !tbaa !10

After applying below patch to ndk r25c clang, I can see the error is gone.

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index e95c6bf6e111..f84fbd638365 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -5637,6 +5637,11 @@ ElementCount LoopVectorizationCostModel::getMaximizedVFForTarget(
         MaxVF = MinVF;
       }
     }     
+
+    // Invalidate any widening decisions we might have made, in case the loop
+    // requires prediction (decided later), but we have already made some
+    // load/store widening decisions.
+    invalidateCostModelingDecisions();
   }
   return MaxVF;
 }

@DanAlbert
Copy link
Member

If we end up making another r25 release we'll do that, but I don't think that's happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Status: Merged
Development

No branches or pull requests

7 participants