Skip to content

Unable to activate optimization option up to O0 on the CUDA GPU test case #1659

@bemichel

Description

@bemichel

Hi,

Thanks to the doc and your help, i was able to setting up a representative test on GPU CUDA with Enzyme both on forward and backward mode : https://fwd.gymni.ch/TWC7tS

I use clang-14+Enzyme-0.0.81+CUDA-11.2 and the results seem good :

$> clang++ -DENABLE_ENZYME -I${CUDAPATH}/include test.cu -fplugin=${ENZYMEPATH}/lib/ClangEnzyme-14.so --cuda-gpu-arch=sm_61 -lcudart -L${CUDAPATH}/11.2/lib64
$> ./a.out
[GPU, direct] a[0]         == 12.000000                                                                                     
[GPU, direct] a[nb_cell-1] == 12.000000                                                                                     
[GPU, direct] b[0]         == 437.000000                                                                                    
[GPU, direct] b[nb_cell-1] == 437.000000
[GPU, forward] da[0]         == 1.000000
[GPU, forward] da[nb_cell-1] == 1.000000
[GPU, forward] db[0]         == 72.000000
[GPU, forward] db[nb_cell-1] == 72.000000
[GPU, backward] da[0]         == 72.000000
[GPU, backward] da[nb_cell-1] == 72.000000
[GPU, backward] db[0]         == 0.000000
[GPU, backward] db[nb_cell-1] == 0.000000

But if i try the same compilation step with -0[123], Enzyme fails :

$> clang++ -O1 -DENABLE_ENZYME -I${CUDAPATH}/include test.cu -fplugin=${ENZYMEPATH}/lib/ClangEnzyme-14.so --cuda-gpu-arch=sm_61 -lcudart -L${CUDAPATH}/11.2/lib64
clang-14: /.../gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/include/llvm/Support/Casting.h:90: static bool llvm::isa_impl_cl<To, From*>::doit(const From*) [with To = llvm::ConstantAsMetadata; From = llvm::Metadata]: Assertion `Val && "isa<> used on a null pointer"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /directory/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14 -cc1 -triple x86_64-unknown-linux-gnu -target-sdk-version=11.2 -aux-triple nvptx64-nvidia-cuda -emit-obj --mrelax-relocations -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name test_jambon.cu -mrelocation-model static -mframe-pointer=none -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -funwind-tables=2 -target-cpu x86-64 -tune-cpu generic -mllvm -treat-scalable-fixed-error-as-warning -debugger-tuning=gdb -fcoverage-compilation-dir=directory/test_enzyme -resource-dir /directory/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/lib/clang/14.0.6 -internal-isystem /directory/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/lib/clang/14.0.6/include/cuda_wrappers -include __clang_cuda_runtime_wrapper.h -D ENABLE_ENZYME -I /directory/nvidia-hpc-sdk/22.2-gnu831/Linux_x86_64/22.2/cuda/11.2/include -I /directory/nvidia-hpc-sdk/22.2-gnu831/Linux_x86_64/22.2/math_libs/11.2/targets/x86_64-linux/include -I/directory/hwloc/2.4.1-gnu831-hpc/include -I/directory/openmpi/4.0.5-gnu831-hpc/include -I/directory/intel/oneapi/mkl/2021.2.0/include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8 -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/x86_64-redhat-linux -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/backward -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8 -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/x86_64-redhat-linux -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/backward -internal-isystem /directory/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/lib/clang/14.0.6/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../x86_64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /directory/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/lib/clang/14.0.6/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../x86_64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /directory/nvidia-hpc-sdk/22.2-gnu831/Linux_x86_64/22.2/cuda/11.2/include -O1 -fdeprecated-macro -fdebug-compilation-dir=directory/test_enzyme -ferror-limit 19 -fgnuc-version=4.2.1 -fcxx-exceptions -fexceptions -fcolor-diagnostics -load /directory/gcc-10.2.0/enzyme-0.0.81-pz4de3ykrazxwzcd3rlouco7s24xmmdu/lib/ClangEnzyme-14.so -fcuda-include-gpubinary /tmp/test_jambon-1b521c.fatbin -cuid=7e7be506b6b6c538 -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o /tmp/test_jambon-3c93a8.o -x cuda test_jambon.cu
1.      <eof> parser at end of file
2.      Optimizer
 #0 0x000000000333bb6f PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
 #1 0x0000000003338ebe SignalHandler(int) Signals.cpp:0:0
 #2 0x00001492f34f2b30 __restore_rt sigaction.c:0:0
 #3 0x00001492f1f1e84f raise (/lib64/libc.so.6+0x3784f)
 #4 0x00001492f1f08c45 abort (/lib64/libc.so.6+0x21c45)
 #5 0x00001492f1f08b19 _nl_load_domain.cold.0 loadmsgcat.c:0:0
 #6 0x00001492f1f16e36 .annobin___GI___assert_fail.end assert.c:0:0
 #7 0x00001492f1831923 (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-10.2.0/enzyme-0.0.81-pz4de3ykrazxwzcd3rlouco7s24xmmdu/lib/ClangEnzyme-14.so+0x18f923)
 #8 0x00001492f1875f8b EnzymeLogic::CreateForwardDiff(llvm::Function*, DIFFE_TYPE, llvm::ArrayRef<DIFFE_TYPE>, TypeAnalysis&, bool, DerivativeMode, bool, unsigned int, llvm::Type*, FnTypeInfo const&, std::vector<bool, std::allocator<bool> >, AugmentedReturn const*, bool) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-10.2.0/enzyme-0.0.81-pz4de3ykrazxwzcd3rlouco7s24xmmdu/lib/ClangEnzyme-14.so+0x1d3f8b)
 #9 0x00001492f180d8c4 (anonymous namespace)::EnzymeBase::HandleAutoDiff(llvm::Instruction*, unsigned int, llvm::Value*, llvm::Type*, llvm::SmallVectorImpl<llvm::Value*>&, std::map<int, llvm::Type*, std::less<int>, std::allocator<std::pair<int const, llvm::Type*> > > const&, std::vector<DIFFE_TYPE, std::allocator<DIFFE_TYPE> > const&, llvm::Function*, DerivativeMode, (anonymous namespace)::EnzymeBase::Options&, bool) Enzyme.cpp:0:0
#10 0x00001492f180fd30 (anonymous namespace)::EnzymeBase::HandleAutoDiffArguments(llvm::CallInst*, DerivativeMode, bool) Enzyme.cpp:0:0
#11 0x00001492f1812a35 (anonymous namespace)::EnzymeBase::lowerEnzymeCalls(llvm::Function&, std::set<llvm::Function*, std::less<llvm::Function*>, std::allocator<llvm::Function*> >&) Enzyme.cpp:0:0
#12 0x00001492f18166c6 (anonymous namespace)::EnzymeBase::run(llvm::Module&) Enzyme.cpp:0:0
#13 0x00001492f182fd8e llvm::detail::PassModel<llvm::Module, EnzymeNewPM, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-10.2.0/enzyme-0.0.81-pz4de3ykrazxwzcd3rlouco7s24xmmdu/lib/ClangEnzyme-14.so+0x18dd8e)
#14 0x0000000002afb0a9 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x2afb0a9)
#15 0x0000000003648736 (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile> >&) (.constprop.902) BackendUtil.cpp:0:0
#16 0x000000000364a7b3 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x364a7b3)
#17 0x00000000042cc38d clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x42cc38d)
#18 0x0000000003cdba38 clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x3cdba38)
#19 0x00000000050981c9 clang::ParseAST(clang::Sema&, bool, bool) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x50981c9)
#20 0x00000000042cc6e2 clang::CodeGenAction::ExecuteAction() (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x42cc6e2)
#21 0x0000000003ca9231 clang::FrontendAction::Execute() (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x3ca9231)
#22 0x0000000003c3b35a clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x3c3b35a)
#23 0x0000000003d6ef01 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x3d6ef01)
#24 0x0000000000ed78c4 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0xed78c4)
#25 0x0000000000ed5079 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) driver.cpp:0:0
#26 0x0000000000e08bbc main (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0xe08bbc)
#27 0x00001492f1f0a803 __libc_start_main (/lib64/libc.so.6+0x23803)
#28 0x0000000000ed395e _start (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0xed395e)
clang-14: error: unable to execute command: Aborted (core dumped)
clang-14: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 14.0.6
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin
clang-14: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-14: note: diagnostic msg: /tmp/test_jambon-69a59d.cu
clang-14: note: diagnostic msg: /tmp/test_jambon-4dc09a/test_jambon-sm_61.cu
clang-14: note: diagnostic msg: /tmp/test_jambon-69a59d.sh
clang-14: note: diagnostic msg: 

********************

Is my code can not be compiled with upper optimization option than -O0 ?

Or as it seems explained in the documentation (https://enzyme.mit.edu/getting_started/CUDAGuide/#cuda-example) :

Note that this procedure (using ClangEnzyme as opposed to LLVMEnzyme manually) inserts Enzyme at a specific locaton in LLVM’s
optimization pipeline. The default ordering should be reasonable, however, the precise ordering of optimization passes may
 [impact performance](https://proceedings.mlsys.org/paper/2020/file/4e732ced3463d06de0ca9a15b6153677-Paper.pdf) .
 If there is a performance issue that you suspect may be due to optimization ordering, please
 [open an issue](https://github.com/EnzymeAD/Enzyme/issues/new) .

Is there another way to do the compilation/differentiation phase to be able to activate -O[123] option ?

Thanks for your help,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions