-
Notifications
You must be signed in to change notification settings - Fork 145
Closed
Description
Hi,
Thanks to the doc and your help, i was able to setting up a representative test on GPU CUDA with Enzyme both on forward and backward mode : https://fwd.gymni.ch/TWC7tS
I use clang-14+Enzyme-0.0.81+CUDA-11.2 and the results seem good :
$> clang++ -DENABLE_ENZYME -I${CUDAPATH}/include test.cu -fplugin=${ENZYMEPATH}/lib/ClangEnzyme-14.so --cuda-gpu-arch=sm_61 -lcudart -L${CUDAPATH}/11.2/lib64
$> ./a.out
[GPU, direct] a[0] == 12.000000
[GPU, direct] a[nb_cell-1] == 12.000000
[GPU, direct] b[0] == 437.000000
[GPU, direct] b[nb_cell-1] == 437.000000
[GPU, forward] da[0] == 1.000000
[GPU, forward] da[nb_cell-1] == 1.000000
[GPU, forward] db[0] == 72.000000
[GPU, forward] db[nb_cell-1] == 72.000000
[GPU, backward] da[0] == 72.000000
[GPU, backward] da[nb_cell-1] == 72.000000
[GPU, backward] db[0] == 0.000000
[GPU, backward] db[nb_cell-1] == 0.000000But if i try the same compilation step with -0[123], Enzyme fails :
$> clang++ -O1 -DENABLE_ENZYME -I${CUDAPATH}/include test.cu -fplugin=${ENZYMEPATH}/lib/ClangEnzyme-14.so --cuda-gpu-arch=sm_61 -lcudart -L${CUDAPATH}/11.2/lib64
clang-14: /.../gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/include/llvm/Support/Casting.h:90: static bool llvm::isa_impl_cl<To, From*>::doit(const From*) [with To = llvm::ConstantAsMetadata; From = llvm::Metadata]: Assertion `Val && "isa<> used on a null pointer"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0. Program arguments: /directory/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14 -cc1 -triple x86_64-unknown-linux-gnu -target-sdk-version=11.2 -aux-triple nvptx64-nvidia-cuda -emit-obj --mrelax-relocations -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name test_jambon.cu -mrelocation-model static -mframe-pointer=none -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -funwind-tables=2 -target-cpu x86-64 -tune-cpu generic -mllvm -treat-scalable-fixed-error-as-warning -debugger-tuning=gdb -fcoverage-compilation-dir=directory/test_enzyme -resource-dir /directory/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/lib/clang/14.0.6 -internal-isystem /directory/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/lib/clang/14.0.6/include/cuda_wrappers -include __clang_cuda_runtime_wrapper.h -D ENABLE_ENZYME -I /directory/nvidia-hpc-sdk/22.2-gnu831/Linux_x86_64/22.2/cuda/11.2/include -I /directory/nvidia-hpc-sdk/22.2-gnu831/Linux_x86_64/22.2/math_libs/11.2/targets/x86_64-linux/include -I/directory/hwloc/2.4.1-gnu831-hpc/include -I/directory/openmpi/4.0.5-gnu831-hpc/include -I/directory/intel/oneapi/mkl/2021.2.0/include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8 -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/x86_64-redhat-linux -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/backward -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8 -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/x86_64-redhat-linux -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/backward -internal-isystem /directory/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/lib/clang/14.0.6/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../x86_64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /directory/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/lib/clang/14.0.6/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-redhat-linux/8/../../../../x86_64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /directory/nvidia-hpc-sdk/22.2-gnu831/Linux_x86_64/22.2/cuda/11.2/include -O1 -fdeprecated-macro -fdebug-compilation-dir=directory/test_enzyme -ferror-limit 19 -fgnuc-version=4.2.1 -fcxx-exceptions -fexceptions -fcolor-diagnostics -load /directory/gcc-10.2.0/enzyme-0.0.81-pz4de3ykrazxwzcd3rlouco7s24xmmdu/lib/ClangEnzyme-14.so -fcuda-include-gpubinary /tmp/test_jambon-1b521c.fatbin -cuid=7e7be506b6b6c538 -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o /tmp/test_jambon-3c93a8.o -x cuda test_jambon.cu
1. <eof> parser at end of file
2. Optimizer
#0 0x000000000333bb6f PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
#1 0x0000000003338ebe SignalHandler(int) Signals.cpp:0:0
#2 0x00001492f34f2b30 __restore_rt sigaction.c:0:0
#3 0x00001492f1f1e84f raise (/lib64/libc.so.6+0x3784f)
#4 0x00001492f1f08c45 abort (/lib64/libc.so.6+0x21c45)
#5 0x00001492f1f08b19 _nl_load_domain.cold.0 loadmsgcat.c:0:0
#6 0x00001492f1f16e36 .annobin___GI___assert_fail.end assert.c:0:0
#7 0x00001492f1831923 (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-10.2.0/enzyme-0.0.81-pz4de3ykrazxwzcd3rlouco7s24xmmdu/lib/ClangEnzyme-14.so+0x18f923)
#8 0x00001492f1875f8b EnzymeLogic::CreateForwardDiff(llvm::Function*, DIFFE_TYPE, llvm::ArrayRef<DIFFE_TYPE>, TypeAnalysis&, bool, DerivativeMode, bool, unsigned int, llvm::Type*, FnTypeInfo const&, std::vector<bool, std::allocator<bool> >, AugmentedReturn const*, bool) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-10.2.0/enzyme-0.0.81-pz4de3ykrazxwzcd3rlouco7s24xmmdu/lib/ClangEnzyme-14.so+0x1d3f8b)
#9 0x00001492f180d8c4 (anonymous namespace)::EnzymeBase::HandleAutoDiff(llvm::Instruction*, unsigned int, llvm::Value*, llvm::Type*, llvm::SmallVectorImpl<llvm::Value*>&, std::map<int, llvm::Type*, std::less<int>, std::allocator<std::pair<int const, llvm::Type*> > > const&, std::vector<DIFFE_TYPE, std::allocator<DIFFE_TYPE> > const&, llvm::Function*, DerivativeMode, (anonymous namespace)::EnzymeBase::Options&, bool) Enzyme.cpp:0:0
#10 0x00001492f180fd30 (anonymous namespace)::EnzymeBase::HandleAutoDiffArguments(llvm::CallInst*, DerivativeMode, bool) Enzyme.cpp:0:0
#11 0x00001492f1812a35 (anonymous namespace)::EnzymeBase::lowerEnzymeCalls(llvm::Function&, std::set<llvm::Function*, std::less<llvm::Function*>, std::allocator<llvm::Function*> >&) Enzyme.cpp:0:0
#12 0x00001492f18166c6 (anonymous namespace)::EnzymeBase::run(llvm::Module&) Enzyme.cpp:0:0
#13 0x00001492f182fd8e llvm::detail::PassModel<llvm::Module, EnzymeNewPM, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-10.2.0/enzyme-0.0.81-pz4de3ykrazxwzcd3rlouco7s24xmmdu/lib/ClangEnzyme-14.so+0x18dd8e)
#14 0x0000000002afb0a9 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x2afb0a9)
#15 0x0000000003648736 (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile> >&) (.constprop.902) BackendUtil.cpp:0:0
#16 0x000000000364a7b3 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x364a7b3)
#17 0x00000000042cc38d clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x42cc38d)
#18 0x0000000003cdba38 clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x3cdba38)
#19 0x00000000050981c9 clang::ParseAST(clang::Sema&, bool, bool) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x50981c9)
#20 0x00000000042cc6e2 clang::CodeGenAction::ExecuteAction() (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x42cc6e2)
#21 0x0000000003ca9231 clang::FrontendAction::Execute() (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x3ca9231)
#22 0x0000000003c3b35a clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x3c3b35a)
#23 0x0000000003d6ef01 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0x3d6ef01)
#24 0x0000000000ed78c4 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0xed78c4)
#25 0x0000000000ed5079 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) driver.cpp:0:0
#26 0x0000000000e08bbc main (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0xe08bbc)
#27 0x00001492f1f0a803 __libc_start_main (/lib64/libc.so.6+0x23803)
#28 0x0000000000ed395e _start (/scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin/clang-14+0xed395e)
clang-14: error: unable to execute command: Aborted (core dumped)
clang-14: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 14.0.6
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /scratchm/sonics/opt_2024/linux-rhel8-broadwell/gcc-8.3.1/llvm-14.0.6-mlbglx3o3n5rirgy2xfi4l6f66wjzqhq/bin
clang-14: note: diagnostic msg:
********************
PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-14: note: diagnostic msg: /tmp/test_jambon-69a59d.cu
clang-14: note: diagnostic msg: /tmp/test_jambon-4dc09a/test_jambon-sm_61.cu
clang-14: note: diagnostic msg: /tmp/test_jambon-69a59d.sh
clang-14: note: diagnostic msg:
********************Is my code can not be compiled with upper optimization option than -O0 ?
Or as it seems explained in the documentation (https://enzyme.mit.edu/getting_started/CUDAGuide/#cuda-example) :
Note that this procedure (using ClangEnzyme as opposed to LLVMEnzyme manually) inserts Enzyme at a specific locaton in LLVM’s
optimization pipeline. The default ordering should be reasonable, however, the precise ordering of optimization passes may
[impact performance](https://proceedings.mlsys.org/paper/2020/file/4e732ced3463d06de0ca9a15b6153677-Paper.pdf) .
If there is a performance issue that you suspect may be due to optimization ordering, please
[open an issue](https://github.com/EnzymeAD/Enzyme/issues/new) .Is there another way to do the compilation/differentiation phase to be able to activate -O[123] option ?
Thanks for your help,
Metadata
Metadata
Assignees
Labels
No labels