Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Target] Use LLVM target parser for determining Arm(R) A-Profile Architecture features #16425

Merged
merged 9 commits into from
Mar 27, 2024

Conversation

lhutton1
Copy link
Contributor

Currently, target features are determined by a set of fixed checks on the target string. This works well for checking support of a small number of simple features, but it doesn't scale. Some problems include:

  • There are many non-trivial conditions for which a feature may(not) be available. It is easy to miss these with the current implementation.
  • The inclusion of some features in a target string can imply other features. For example, "+sve2" implies "+sve". This currently isn't taken into account.
  • The tests in tests/cpp/target/parsers/aprofile_test.c suggest that targets such as "llvm -mcpu=cortex-a+neon" and "llvm -mattr=+noneon" are supported target strings. The features will be correctly parsed in TVM, however, they are not valid in LLVM. Therefore, it's possible that TVM and LLVM have different understanding of the features available.

This commit uses the more robust LLVM target parser to determine support for the features in TVM. It leverages previous infrastructure added to TVM for obtaining a list of all supported features given an input target, and uses this to check the existance of certain features we're interested in. It should be trivial to grow this list over time. As a result of this change, the problems mentioned above are solved.

In the current form, this commit drops support for target strings such as "llvm -mcpu=cortex-a+neon" and "llvm -mattr=+noneon". A scan of the codebase suggests this functionality is not in use (only in test cases). Should we feel the need to support them, or have a smoother migration for downstream users of TVM we can add a translator to the parser to convert these into LLVM compatible targets.

cc @Mousius @cbalint13 @ekalda @neildhickey

@cbalint13
Copy link
Contributor

cbalint13 commented Jan 18, 2024

Hi @lhutton1 !

Thanks a lot for picking up the recently introduce llvm reflection (well, kind of) for ARM targets too !

Currently, target features are determined by a set of fixed checks on the target string. This works well for checking support of a small number of simple features, but it doesn't scale. Some problems include:

  • There are many non-trivial conditions for which a feature may(not) be available. It is easy to miss these with the current implementation.
  • The inclusion of some features in a target string can imply other features. For example, "+sve2" implies "+sve". This currently isn't taken into account.
  • The tests in tests/cpp/target/parsers/aprofile_test.c suggest that targets such as "llvm -mcpu=cortex-a+neon" and "llvm -mattr=+noneon" are supported target strings. The features will be correctly parsed in TVM, however, they are not valid in LLVM. Therefore, it's possible that TVM and LLVM have different understanding of the features available.
    This commit uses the more robust LLVM target parser to determine support for the features in TVM. It leverages previous infrastructure added to TVM for obtaining a list of all supported features given an input target, and uses this to check the existance of certain features we're interested in. It should be trivial to grow this list over time. As a result of this change, the problems mentioned above are solved.

Yes I agree, I was also concerned about, all the mentioned target strings are "non-legit" from llvm point of view.

In the current form, this commit drops support for target strings such as "llvm -mcpu=cortex-a+neon" and "llvm -mattr=+noneon". A scan of the codebase suggests this functionality is not in use (only in test cases). Should we feel the need to support them, or have a smoother migration for downstream users of TVM we can add a translator to the parser to convert these into LLVM compatible targets.

Addition or subtraction can be done with +feat or -feat, a.f.a.i.k. there is no such thing as +no<whatever> in llvm.

I salute this PR, nice work @lhutton1 !

@lhutton1 lhutton1 force-pushed the improve-aprofile-target-parser branch from dfcdb3e to e272cfd Compare January 22, 2024 10:04
@lhutton1 lhutton1 force-pushed the improve-aprofile-target-parser branch from 037c930 to abeb0fa Compare January 29, 2024 10:31
@lhutton1
Copy link
Contributor Author

lhutton1 commented Feb 2, 2024

also cc @kparzysz-quic

@lhutton1
Copy link
Contributor Author

friendly ping on this

…itecture features

Currently, target features are determined by a set of fixed checks on
the target string. This works well for checking support of a small
number of simple features, but it doesn't scale. Some problems include:
- There are many non-trivial conditions for which a feature may(not) be
  available. It is easy to miss these with the current implementation.
- The inclusion of some features in a target string can imply other
  features. For example, "+sve" implies "+neon". This currently isn't
  taken into account.
- The tests in tests/cpp/target/parsers/aprofile_test.c suggest that
  targets such as "llvm -mcpu=cortex-a+neon" and "llvm -mattr=+noneon"
  are supported target strings. The features will be correctly parsed in
  TVM, however, they are not valid in LLVM. Therefore, it's possible
  that TVM and LLVM have different understanding of the features
  available.

This commit uses the more robust LLVM target parser to determine support
for the features in TVM. It leverages previous infrastructure added to
TVM for obtaining a list of all supported features given an input
target, and uses this to check the existance of certain features we're
interested in. It should be trivial to grow this list over time. As a
result of this change, the problems mentioned above are solved.

In the current form, this commit drops support for target strings such
as "llvm -mcpu=cortex-a+neon" and "llvm -mattr=+noneon". A scan of the
codebase suggests this functionality is not in use (only in test cases).
Should we feel the need to support them, or have a smoother migration
for downstream users of TVM we can add a translator to the parser to
convert these into LLVM compatible targets.

Change-Id: Ic2bf3b68c8af74025ec388d304bd014624c0c585
`ci-gpu` - Made 'codegen' namespacing more specific
`ci-i386` - Required a more modern version of LLVM for the aprofile tests
`ci-hexagon` - Skipped the aprofile tests if LLVM had not been built with
               the correct targets

Change-Id: I792b5994fcea52c74b40e040630db1bbd96ca16c
Change-Id: If41e4fd32947a2acddfe9b0691a0c9ba3245d722
Notably, don't abort when encountering a CPU architecture that's not
recognised by LLVM. This can happen when compiling with an older version
of LLVM. Instead, output a warning.

Also add additional checks in the parser for cases when TVM is not
compiled with LLVM support and when LLVM is compiled without support
for the necessary architectures.

Change-Id: I646cb68cadd5462ee2bd694ba5c22ff7dad8f555
Change-Id: Ibfd0beb6dda00aa2a93cd0b47cf28f045e3fde5c
Change-Id: I0b88ecad2987297c428d0f0ca95db35d828c1672
Change-Id: I98a72a95e2b51f8a4b577dcef15f40e7c28719a2
@lhutton1 lhutton1 force-pushed the improve-aprofile-target-parser branch 2 times, most recently from 2e1ce9e to d3234bb Compare March 19, 2024 09:33
@lhutton1
Copy link
Contributor Author

@tvm-bot rerun

Change-Id: Iac24bbe31251ebbafacb410abbb67f1e32c171d6
@lhutton1 lhutton1 force-pushed the improve-aprofile-target-parser branch from d3234bb to 92c68f2 Compare March 21, 2024 13:42
@lhutton1
Copy link
Contributor Author

the duplication of the ci-arm CI jobs still appears to be occurring - trying a force push to see if that helps

Change-Id: Ic8391063d403cc7fe7e2e7d1b4b3c2d6d3bc3146
@lhutton1
Copy link
Contributor Author

lhutton1 commented Mar 25, 2024

It seems CI was failing due to a memory leak observed when calling GetAllLLVMCpuFeatures() and GetAllLLVMTargetArches(). The following is a valgrind report for an executable that creates a new target tvm::Target("llvm -mtriple=aarch64-linux-gnu") with this PR's changes:

...
==356347== 8,040 (1,560 direct, 6,480 indirect) bytes in 1 blocks are definitely lost in loss record 42,621 of 42,667
==356347==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==356347==    by 0x1136B1B9: ??? (in /usr/lib/x86_64-linux-gnu/libLLVM-17.so.1)
==356347==    by 0xBC44347: llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOpt::Level, bool) const (TargetRegistry.h:488)
==356347==    by 0xBC3EF2D: tvm::codegen::CreateLLVMTargetMachine(llvm::Target const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, llvm::TargetOptions const&, llvm::Reloc::Model const&, llvm::CodeModel::Model const&, llvm::CodeGenOpt::Level const&) (llvm_instance.cc:398)
==356347==    by 0xBC3F0B2: tvm::codegen::GetLLVMSubtargetInfo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (llvm_instance.cc:413)
==356347==    by 0xBC41E09: tvm::codegen::LLVMTargetInfo::GetAllLLVMTargetArches() (llvm_instance.cc:843)
==356347==    by 0xBC3D5CA: tvm::codegen::LLVMTargetInfo::LLVMTargetInfo(tvm::codegen::LLVMInstance&, tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> const&) (llvm_instance.cc:220)
==356347==    by 0xAD6FCE8: tvm::target::parsers::aprofile::GetFeatures(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (aprofile.cc:101)
==356347==    by 0xAD70A5E: tvm::target::parsers::aprofile::ParseTarget(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (aprofile.cc:137)
==356347==    by 0xAD71C25: tvm::target::parsers::cpu::ParseTarget(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (cpu.cc:55)
==356347==    by 0xAEAB127: tvm::runtime::TypedPackedFunc<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>::AssignTypedLambda<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const (packed_func.h:1826)
==356347==    by 0xAEB3382: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>::AssignTypedLambda<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) (packed_func.h:1252)
==356347== 
==356347== 8,040 (1,560 direct, 6,480 indirect) bytes in 1 blocks are definitely lost in loss record 42,622 of 42,667
==356347==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==356347==    by 0x1136B1B9: ??? (in /usr/lib/x86_64-linux-gnu/libLLVM-17.so.1)
==356347==    by 0xBC44347: llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOpt::Level, bool) const (TargetRegistry.h:488)
==356347==    by 0xBC3EF2D: tvm::codegen::CreateLLVMTargetMachine(llvm::Target const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, llvm::TargetOptions const&, llvm::Reloc::Model const&, llvm::CodeModel::Model const&, llvm::CodeGenOpt::Level const&) (llvm_instance.cc:398)
==356347==    by 0xBC3F0B2: tvm::codegen::GetLLVMSubtargetInfo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (llvm_instance.cc:413)
==356347==    by 0xBC4217B: tvm::codegen::LLVMTargetInfo::GetAllLLVMCpuFeatures() (llvm_instance.cc:868)
==356347==    by 0xAD6FD01: tvm::target::parsers::aprofile::GetFeatures(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (aprofile.cc:102)
==356347==    by 0xAD70A5E: tvm::target::parsers::aprofile::ParseTarget(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (aprofile.cc:137)
==356347==    by 0xAD71C25: tvm::target::parsers::cpu::ParseTarget(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (cpu.cc:55)
==356347==    by 0xAEAB127: tvm::runtime::TypedPackedFunc<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>::AssignTypedLambda<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const (packed_func.h:1826)
==356347==    by 0xAEB3382: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>::AssignTypedLambda<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) (packed_func.h:1252)
==356347==    by 0xAE8D453: CallPacked (packed_func.h:1256)
==356347==    by 0xAE8D453: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> >(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>&&) const (packed_func.h:1784)
...

The problem seems to come from GetLLVMSubtargetInfo(...) which creates a target machine instance to get a pointer to an MCSubtargetInfo object. The reference to the created target machine is lost and the memory is not freed. I've tried to naively rework the code to remove the GetLLVMSubtargetInfo(...) function and therefore avoid the leak, but happy to hear any better ideas (cc @cbalint13, @kparzysz)

The reason for this surfacing only now (after previous successful CI runs), is that #16513 was merged, which means the changes in this PR are now run much more frequently in CI.

@cbalint13
Copy link
Contributor

It seems CI was failing due to a memory leak observed when calling GetAllLLVMCpuFeatures() and GetAllLLVMTargetArches(). The following is a valgrind report for an executable that creates a new target tvm::Target("llvm -mtriple=aarch64-linux-gnu") with this PR's changes:

...
==356347== 8,040 (1,560 direct, 6,480 indirect) bytes in 1 blocks are definitely lost in loss record 42,621 of 42,667
==356347==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==356347==    by 0x1136B1B9: ??? (in /usr/lib/x86_64-linux-gnu/libLLVM-17.so.1)
==356347==    by 0xBC44347: llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOpt::Level, bool) const (TargetRegistry.h:488)
==356347==    by 0xBC3EF2D: tvm::codegen::CreateLLVMTargetMachine(llvm::Target const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, llvm::TargetOptions const&, llvm::Reloc::Model const&, llvm::CodeModel::Model const&, llvm::CodeGenOpt::Level const&) (llvm_instance.cc:398)
==356347==    by 0xBC3F0B2: tvm::codegen::GetLLVMSubtargetInfo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (llvm_instance.cc:413)
==356347==    by 0xBC41E09: tvm::codegen::LLVMTargetInfo::GetAllLLVMTargetArches() (llvm_instance.cc:843)
==356347==    by 0xBC3D5CA: tvm::codegen::LLVMTargetInfo::LLVMTargetInfo(tvm::codegen::LLVMInstance&, tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> const&) (llvm_instance.cc:220)
==356347==    by 0xAD6FCE8: tvm::target::parsers::aprofile::GetFeatures(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (aprofile.cc:101)
==356347==    by 0xAD70A5E: tvm::target::parsers::aprofile::ParseTarget(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (aprofile.cc:137)
==356347==    by 0xAD71C25: tvm::target::parsers::cpu::ParseTarget(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (cpu.cc:55)
==356347==    by 0xAEAB127: tvm::runtime::TypedPackedFunc<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>::AssignTypedLambda<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const (packed_func.h:1826)
==356347==    by 0xAEB3382: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>::AssignTypedLambda<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) (packed_func.h:1252)
==356347== 
==356347== 8,040 (1,560 direct, 6,480 indirect) bytes in 1 blocks are definitely lost in loss record 42,622 of 42,667
==356347==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==356347==    by 0x1136B1B9: ??? (in /usr/lib/x86_64-linux-gnu/libLLVM-17.so.1)
==356347==    by 0xBC44347: llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOpt::Level, bool) const (TargetRegistry.h:488)
==356347==    by 0xBC3EF2D: tvm::codegen::CreateLLVMTargetMachine(llvm::Target const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, llvm::TargetOptions const&, llvm::Reloc::Model const&, llvm::CodeModel::Model const&, llvm::CodeGenOpt::Level const&) (llvm_instance.cc:398)
==356347==    by 0xBC3F0B2: tvm::codegen::GetLLVMSubtargetInfo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (llvm_instance.cc:413)
==356347==    by 0xBC4217B: tvm::codegen::LLVMTargetInfo::GetAllLLVMCpuFeatures() (llvm_instance.cc:868)
==356347==    by 0xAD6FD01: tvm::target::parsers::aprofile::GetFeatures(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (aprofile.cc:102)
==356347==    by 0xAD70A5E: tvm::target::parsers::aprofile::ParseTarget(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (aprofile.cc:137)
==356347==    by 0xAD71C25: tvm::target::parsers::cpu::ParseTarget(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>) (cpu.cc:55)
==356347==    by 0xAEAB127: tvm::runtime::TypedPackedFunc<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>::AssignTypedLambda<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const (packed_func.h:1826)
==356347==    by 0xAEB3382: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>::AssignTypedLambda<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>)>(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> (*)(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) (packed_func.h:1252)
==356347==    by 0xAE8D453: CallPacked (packed_func.h:1256)
==356347==    by 0xAE8D453: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void> >(tvm::runtime::Map<tvm::runtime::String, tvm::runtime::ObjectRef, void, void>&&) const (packed_func.h:1784)
...

The problem seems to come from GetLLVMSubtargetInfo(...) which creates a target machine instance to get a pointer to an MCSubtargetInfo object. The reference to the created target machine is lost and the memory is not freed. I've tried to naively rework the code to remove the GetLLVMSubtargetInfo(...) function and therefore avoid the leak, but happy to hear any better ideas (cc @cbalint13, @kparzysz)

  • Is this happening with llvm18 too ?
  • I look into this (across multiple llvm versions), could help me with a simple testcase ?

@lhutton1
Copy link
Contributor Author

Thanks for the quick response @cbalint13! I didn't try with llvm18 yet, only llvm17. Calling GetAllLLVMCpuFeatures() and GetAllLLVMTargetArches() should reproduce it, but I'll be able to come up with a more concrete example tomorrow

@cbalint13
Copy link
Contributor

Thanks for the quick response @cbalint13! I didn't try with llvm18 yet, only llvm17. Calling GetAllLLVMCpuFeatures() and GetAllLLVMTargetArches() should reproduce it, but I'll be able to come up with a more concrete example tomorrow

I'l test it for llvm18 too, just ping me if you have a concrete sample, until than I try invoking as you said (hope to catch it).

@lhutton1
Copy link
Contributor Author

lhutton1 commented Mar 26, 2024

Here is a reproducer:
mem_leak.cpp

#include "tvm/runtime/registry.h"
#include "tvm/target/target.h"

int main() {
  auto pf = tvm::runtime::Registry::Get("target.llvm_get_cpu_archlist");
  (*pf)(tvm::Target("llvm"));
}

Compile:

g++ -std=c++17 -O2 -fPIC -I{TVM_DIR}/include -I{TVM_DIR}/3rdparty/dmlc-core/include -I{TVM_DIR}/tvm/3rdparty/dlpack/include -DDMLC_USE_LOGGING_LIBRARY=\<tvm/runtime/logging.h\> -o mem_leak_exec mem_leak.cpp -L{TVM_BUILD_DIR} -ldl -ltvm -pthread

Run with valgrind:

LD_PRELOAD="{TVM_BUILD_DIR}/libtvm.so" valgrind --leak-check=full -v --track-origins=yes ./mem_leak_exec

Output:

...
==475237== 12,369 (1,560 direct, 10,809 indirect) bytes in 1 blocks are definitely lost in loss record 42,596 of 42,630
==475237==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==475237==    by 0x12244479: ??? (in /usr/lib/x86_64-linux-gnu/libLLVM-17.so.1)
==475237==    by 0xBC0131B: llvm::Target::createTargetMachine(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOpt::Level, bool) const (TargetRegistry.h:488)
==475237==    by 0xBBFBC05: tvm::codegen::CreateLLVMTargetMachine(llvm::Target const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, llvm::TargetOptions const&, llvm::Reloc::Model const&, llvm::CodeModel::Model const&, llvm::CodeGenOpt::Level const&) (llvm_instance.cc:393)
==475237==    by 0xBBFBD8A: tvm::codegen::GetLLVMSubtargetInfo(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (llvm_instance.cc:408)
==475237==    by 0xBBFEAE1: tvm::codegen::LLVMTargetInfo::GetAllLLVMTargetArches() const (llvm_instance.cc:835)
==475237==    by 0xBBFA2BB: tvm::codegen::LLVMTargetInfo::LLVMTargetInfo(tvm::codegen::LLVMInstance&, tvm::Target const&) (llvm_instance.cc:218)
==475237==    by 0xBC0EE16: tvm::codegen::__mk_TVM8::{lambda(tvm::Target const&)#1}::operator()(tvm::Target const) const (llvm_module.cc:695)
==475237==    by 0xBC188B0: tvm::runtime::TypedPackedFunc<tvm::runtime::Array<tvm::runtime::String, void> (tvm::Target const&)>::AssignTypedLambda<tvm::codegen::__mk_TVM8::{lambda(tvm::Target const&)#1}>(tvm::codegen::__mk_TVM8::{lambda(tvm::Target const&)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const, tvm::runtime::TVMRetValue) const (packed_func.h:1826)
==475237==    by 0xBC233EE: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Array<tvm::runtime::String, void> (tvm::Target const&)>::AssignTypedLambda<tvm::codegen::__mk_TVM8::{lambda(tvm::Target const&)#1}>(tvm::codegen::__mk_TVM8::{lambda(tvm::Target const&)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::TVMRetValue) (packed_func.h:1252)
...

@cbalint13
Copy link
Contributor

Here is a reproducer:
mem_leak.cpp

Thanks a lot for this, I start to look at it now.

@lhutton1 lhutton1 deleted the improve-aprofile-target-parser branch March 27, 2024 15:55
@Lunderberg
Copy link
Contributor

This PR causes spurious error messages to be printed when loading libtvm.so, when the local version of LLVM available does not support one of the CPU architectures defined in tag.cc. For example, the following error messages are printed when LLVM 10.0 is used.

image

@cbalint13
Copy link
Contributor

cbalint13 commented Mar 27, 2024

This PR causes spurious error messages to be printed when loading libtvm.so, when the local version of LLVM available does not support one of the CPU architectures defined in tag.cc. For example, the following error messages are printed when LLVM 10.0 is used.

image

Hi @Lunderberg ,

The messages looks legit, LLVM10 not support those CPU invocations.

Possible enhancements here:

  • The message maybe could be more explicit by adding "using LLVM version X.Y" like e.g.:
    LLVM cpu architecture -mcpu=cortex-a78 is not valid in -mtriple=aarch64{...} using LLVM version X.Y
  • Maybe also we could issue "WARN" instead of "ERROR" level here, but the user should be still notified somehow.

This behaviour was intentionate and introduced here @ PR#15761
With this #1571 TVM has the ability to lookup into LLVM's internal catalog of arches, cpus and their exact features.


Later EDIT:

  • Prior to this PR (ARM related) and [PR#15761 & PR15685 (x86 related)] all things was "hand mapped/coded"
  • Let me give an example on how decision was made prior to these enlisted here: x86 cpu features map

We have direct LLVM awareness, not needing any hardcoded mappings into static lists (unmaintainable IMHO)

@Lunderberg
Copy link
Contributor

Lunderberg commented Mar 27, 2024

The messages look reasonable, based on the available support, but I think they shouldn't be emitted unless the user is attempting to use the invalid target, or making an explicit query of the target parameters. We should not producing an error message when importing TVM.

Regarding changing from a warning to an error, even commenting-out the LOG(ERROR) entirely does not fully remove the error messages, as some are produced from LLVM internally.

@Lunderberg
Copy link
Contributor

I agree on the use of internal architectures being preferable to hard-coded lists. The tags were primarily introduced (IIRC) to handle cases where there is no internal architecture that could be queried, such as GPUs with no readily available table to look up, and with limited availability where the compilation shouldn't require queries to a local GPU.

@lhutton1
Copy link
Contributor Author

Thanks for the discussion @Lunderberg @cbalint13. I agree that we shouldn't remove the error message completely. Just thinking out loud - the problem here seems to be that the targets registered in tag.cc are parsed when loading tvm, is it possible to defer parsing of these registered targets to when they are actually used by the user?

@cbalint13
Copy link
Contributor

cbalint13 commented Mar 27, 2024

@Lunderberg ,

We should not producing an error message when importing TVM.

Can give me a script line how to reproduce (beside LLVM10 presence) ?
Now that's sounds odd, if true I address a fix to this, will test all iterations llvm=range(10,19)

@lhutton1 ,

Thanks for the discussion @Lunderberg @cbalint13. I agree that we shouldn't remove the error message completely. Just thinking out loud - the problem here seems to be that the targets registered in tag.cc are parsed when loading tvm, is it possible to defer parsing of these registered targets to when they are actually used by the user?

If turns true, we should take this one check out from LLVTargetInfo constructor and see a better place for it.


I take care of this, if you don't mind this it will be a new PR.

@Lunderberg
Copy link
Contributor

Can give me a script line how to reproduce (beside LLVM10 presence) ? Now that's sounds odd, if true I address a fix to this, will test all iterations llvm=range(10,19)

@cbalint13 If you add a new tag to tag.cc with the "mcpu" host attribute set to a non-existent type (or edit an existing tag), you can reproduce the error with a newer LLVM version.

I take care of this, if you don't mind this it will be a new PR.

Thank you, and making a new PR would be perfect!

quic-sanirudh added a commit to quic-sanirudh/tvm that referenced this pull request Mar 28, 2024
This is just a minor fix where the recent [PR apache#16425](apache#16425) seems
to have missed this change for LLVM 18 and above, and so we're running
into a compilaion failure.
quic-sanirudh added a commit that referenced this pull request Mar 30, 2024
This is just a minor fix where the recent [PR #16425](#16425) seems
to have missed this change for LLVM 18 and above, and so we're running
into a compilaion failure.
thaisacs pushed a commit to thaisacs/tvm that referenced this pull request Apr 3, 2024
…itecture features (apache#16425)

Currently, target features are determined by a set of fixed checks on
the target string. This works well for checking support of a small
number of simple features, but it doesn't scale. Some problems include:
- There are many non-trivial conditions for which a feature may(not) be
  available. It is easy to miss these with the current implementation.
- The inclusion of some features in a target string can imply other
  features. For example, "+sve" implies "+neon". This currently isn't
  taken into account.
- The tests in tests/cpp/target/parsers/aprofile_test.c suggest that
  targets such as "llvm -mcpu=cortex-a+neon" and "llvm -mattr=+noneon"
  are supported target strings. The features will be correctly parsed in
  TVM, however, they are not valid in LLVM. Therefore, it's possible
  that TVM and LLVM have different understanding of the features
  available.

This commit uses the more robust LLVM target parser to determine support
for the features in TVM. It leverages previous infrastructure added to
TVM for obtaining a list of all supported features given an input
target, and uses this to check the existance of certain features we're
interested in. It should be trivial to grow this list over time. As a
result of this change, the problems mentioned above are solved.

In the current form, this commit drops support for target strings such
as "llvm -mcpu=cortex-a+neon" and "llvm -mattr=+noneon". A scan of the
codebase suggests this functionality is not in use (only in test cases).
Should we feel the need to support them, or have a smoother migration
for downstream users of TVM we can add a translator to the parser to
convert these into LLVM compatible targets.
thaisacs pushed a commit to thaisacs/tvm that referenced this pull request Apr 3, 2024
This is just a minor fix where the recent [PR apache#16425](apache#16425) seems
to have missed this change for LLVM 18 and above, and so we're running
into a compilaion failure.
@tqchen
Copy link
Member

tqchen commented Apr 16, 2024

I am also getting additional errors like

python test.py 
[15:23:28] /home/tqchen/github/tvm/src/target/parsers/aprofile.cc:97: Warning: Cannot parse target features. LLVM was not compiled with support for Arm(R)-based targets.
[15:23:28] /home/tqchen/github/tvm/src/target/parsers/aprofile.cc:97: Warning: Cannot parse target features. LLVM was not compiled with support for Arm(R)-based targets.

this is on a LLVM that was built for rocm (AMD platform). We should not send out an error message during static loading time if ARM target is not used, and only have such error message when we attempt to use tags in aprofile.

One possible approach is to conditionally register the related tags based on availability of related function

lhutton1 added a commit to lhutton1/tvm that referenced this pull request Apr 17, 2024
…port

This commit aims to fix the issue described here:
apache#16425 (comment) by
conditionally registering the target tags based on the availability of
the LLVM AArch64 backend. It's possible to extract the targets LLVM
has been compiled for using `llvm-config --targets-built`.

Change-Id: I20b608aea9ea554b0c0388ee884621305d2d59b9
tqchen pushed a commit that referenced this pull request Apr 17, 2024
…port (#16897)

This commit aims to fix the issue described here:
#16425 (comment) by
conditionally registering the target tags based on the availability of
the LLVM AArch64 backend. It's possible to extract the targets LLVM
has been compiled for using `llvm-config --targets-built`.

Change-Id: I20b608aea9ea554b0c0388ee884621305d2d59b9
@LeiWang1999
Copy link
Contributor

It seems that this pull request may lead to a segmentation fault issue when build with llvm+16.0.0

root@e01d939002c0:~/pr_workspace/main_tvm# python
Python 3.10.14 (main, Apr  6 2024, 18:45:05) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tvm
Segmentation fault (core dumped)

but works with llvm+10.0.0

@cbalint13
Copy link
Contributor

It seems that this pull request may lead to a segmentation fault issue when build with llvm+16.0.0

root@e01d939002c0:~/pr_workspace/main_tvm# python
Python 3.10.14 (main, Apr  6 2024, 18:45:05) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tvm
Segmentation fault (core dumped)

but works with llvm+10.0.0

@LeiWang1999 ,

I'll look at this.
You are on llvm-16.0.0 / https://github.com/llvm/llvm-project/releases/tag/llvmorg-16.0.0 / 08d094a ?

@LeiWang1999
Copy link
Contributor

@LeiWang1999
Copy link
Contributor

LeiWang1999 commented Oct 17, 2024

build with llvm+16.0.0 works on commit 1c734916696a015820667544e0e380f9244c99b9

@LeiWang1999
Copy link
Contributor

LeiWang1999 commented Oct 17, 2024

hi @cbalint13 , with bisect debugging, I found that commit 726a1416497eeca7bfb7dcdbd799d00b33c39f79 is the first bad commit.

root@e01d939002c0:~/pr_workspace/main_tvm# git bisect good
726a1416497eeca7bfb7dcdbd799d00b33c39f79 is the first bad commit
commit 726a1416497eeca7bfb7dcdbd799d00b33c39f79
Author: Luke Hutton <luke.hutton@arm.com>
Date:   Wed Mar 27 15:53:46 2024 +0000

    [Target] Use LLVM target parser for determining Arm(R) A-Profile Architecture features (#16425)
    
    Currently, target features are determined by a set of fixed checks on
    the target string. This works well for checking support of a small
    number of simple features, but it doesn't scale. Some problems include:
    - There are many non-trivial conditions for which a feature may(not) be
      available. It is easy to miss these with the current implementation.
    - The inclusion of some features in a target string can imply other
      features. For example, "+sve" implies "+neon". This currently isn't
      taken into account.
    - The tests in tests/cpp/target/parsers/aprofile_test.c suggest that
      targets such as "llvm -mcpu=cortex-a+neon" and "llvm -mattr=+noneon"
      are supported target strings. The features will be correctly parsed in
      TVM, however, they are not valid in LLVM. Therefore, it's possible
      that TVM and LLVM have different understanding of the features
      available.
    
    This commit uses the more robust LLVM target parser to determine support
    for the features in TVM. It leverages previous infrastructure added to
    TVM for obtaining a list of all supported features given an input
    target, and uses this to check the existance of certain features we're
    interested in. It should be trivial to grow this list over time. As a
    result of this change, the problems mentioned above are solved.
    
    In the current form, this commit drops support for target strings such
    as "llvm -mcpu=cortex-a+neon" and "llvm -mattr=+noneon". A scan of the
    codebase suggests this functionality is not in use (only in test cases).
    Should we feel the need to support them, or have a smoother migration
    for downstream users of TVM we can add a translator to the parser to
    convert these into LLVM compatible targets.

 python/tvm/target/codegen.py                       |   3 +-
 src/target/llvm/llvm_instance.cc                   |  95 ++++----
 src/target/llvm/llvm_instance.h                    |  13 +-
 src/target/llvm/llvm_module.cc                     |   7 +-
 src/target/parsers/aprofile.cc                     |  88 +++----
 tests/cpp/target/parsers/aprofile_test.cc          | 263 +++++++++++++--------
 .../relay/strategy/test_select_implementation.py   |  12 +-
 tests/python/target/test_llvm_features_info.py     |  24 +-
 8 files changed, 282 insertions(+), 223 deletions(-)

@cbalint13
Copy link
Contributor

@LeiWang1999

  • Can give a gdb backtrace (bt full, would be very useful) ?
  • Also what is your tvm's cmake options and build hash (to see what features might be absent/present) ?

I cannot reproduce the crash on my side.

@LeiWang1999
Copy link
Contributor

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

#--------------------------------------------------------------------
#  Template custom cmake configuration for compiling
#
#  This file is used to override the build options in build.
#  If you want to change the configuration, please use the following
#  steps. Assume you are on the root directory. First copy the this
#  file so that any local changes will be ignored by git
#
#  $ mkdir build
#  $ cp cmake/config.cmake build
#
#  Next modify the according entries, and then compile by
#
#  $ cd build
#  $ cmake ..
#
#  Then build in parallel with 8 threads
#
#  $ make -j8
#--------------------------------------------------------------------

#---------------------------------------------
# Backend runtimes.
#---------------------------------------------

# Whether enable CUDA during compile,
#
# Possible values:
# - ON: enable CUDA with cmake's auto search
# - OFF: disable CUDA
# - /path/to/cuda: use specific path to cuda toolkit
set(USE_CUDA OFF)

# Whether to enable NCCL support:
# - ON: enable NCCL with cmake's auto search
# - OFF: disable NCCL
# - /path/to/nccl: use specific path to nccl
set(USE_NCCL OFF)

# Whether enable ROCM runtime
#
# Possible values:
# - ON: enable ROCM with cmake's auto search
# - OFF: disable ROCM
# - /path/to/rocm: use specific path to rocm
set(USE_ROCM OFF)

# Whether to enable RCCL support:
# - ON: enable RCCL with cmake's auto search
# - OFF: disable RCCL
# - /path/to/rccl: use specific path to rccl
set(USE_RCCL OFF)

# Whether enable SDAccel runtime
set(USE_SDACCEL OFF)

# Whether enable Intel FPGA SDK for OpenCL (AOCL) runtime
set(USE_AOCL OFF)

# Whether enable OpenCL runtime
#
# Possible values:
# - ON: enable OpenCL with OpenCL wrapper to remove dependency during build
#       time and trigger dynamic search and loading of OpenCL in runtime
# - OFF: disable OpenCL
# - /path/to/opencl-sdk: use specific path to opencl-sdk
set(USE_OPENCL OFF)

# Wheather to allow OPENCL cl_mem access to host
# cl_mem will be allocated with CL_MEM_ALLOC_HOST_PTR
# OpenCLWorkspace->GetHostPtr API returns the host accessible pointer
set(USE_OPENCL_ENABLE_HOST_PTR OFF)

# Whether enable Metal runtime
set(USE_METAL OFF)

# Whether enable Vulkan runtime
#
# Possible values:
# - ON: enable Vulkan with cmake's auto search
# - OFF: disable vulkan
# - /path/to/vulkan-sdk: use specific path to vulkan-sdk
set(USE_VULKAN OFF)

# Whether to use spirv-tools.and SPIRV-Headers from Khronos github or gitlab.
#
# Possible values:
# - OFF: not to use
# - /path/to/install: path to your khronis spirv-tools and SPIRV-Headers installation directory
#
set(USE_KHRONOS_SPIRV OFF)

# whether enable SPIRV_KHR_DOT_PRODUCT
set(USE_SPIRV_KHR_INTEGER_DOT_PRODUCT OFF)

# Whether enable OpenGL runtime
set(USE_OPENGL OFF)

# Whether enable MicroTVM runtime
set(USE_MICRO OFF)

# Whether enable RPC runtime
set(USE_RPC ON)

# Whether to build the C++ RPC server binary
set(USE_CPP_RPC OFF)

# Whether to build the C++ native runtime tool binary
set(USE_CPP_RTVM OFF)

# Whether to build the iOS RPC server application
set(USE_IOS_RPC OFF)

# Whether embed stackvm into the runtime
set(USE_STACKVM_RUNTIME OFF)

# Whether enable tiny embedded graph executor.
set(USE_GRAPH_EXECUTOR ON)

# Whether enable tiny graph executor with CUDA Graph
set(USE_GRAPH_EXECUTOR_CUDA_GRAPH OFF)

# Whether enable pipeline executor.
set(USE_PIPELINE_EXECUTOR OFF)

# Whether to enable the profiler for the graph executor and vm
set(USE_PROFILER ON)

# Whether enable microTVM standalone runtime
set(USE_MICRO_STANDALONE_RUNTIME OFF)

# Whether build with LLVM support
# Requires LLVM version >= 4.0
#
# Possible values:
# - ON: enable llvm with cmake's find search
# - OFF: disable llvm, note this will disable CPU codegen
#        which is needed for most cases
# - /path/to/llvm-config: enable specific LLVM when multiple llvm-dev is available.
set(USE_LLVM "/home/msra/cy/clang+llvm-13.0.0-x86_64-linux-gnu-ubuntu-20.04/bin/llvm-config --link-static")
set(HIDE_PRIVATE_SYMBOLS ON)

# Whether use MLIR to help analyze, requires USE_LLVM is enabled
# Possible values: ON/OFF
set(USE_MLIR OFF)

#---------------------------------------------
# Contrib libraries
#---------------------------------------------
# Whether to build with BYODT software emulated posit custom datatype
#
# Possible values:
# - ON: enable BYODT posit, requires setting UNIVERSAL_PATH
# - OFF: disable BYODT posit
#
# set(UNIVERSAL_PATH /path/to/stillwater-universal) for ON
set(USE_BYODT_POSIT OFF)

# Whether use BLAS, choices: openblas, atlas, apple
set(USE_BLAS none)

# Whether to use MKL
# Possible values:
# - ON: Enable MKL
# - /path/to/mkl: mkl root path
# - OFF: Disable MKL
# set(USE_MKL /opt/intel/mkl) for UNIX
# set(USE_MKL ../IntelSWTools/compilers_and_libraries_2018/windows/mkl) for WIN32
# set(USE_MKL <path to venv or site-packages directory>) if using `pip install mkl`
set(USE_MKL OFF)

# Whether use DNNL library, aka Intel OneDNN: https://oneapi-src.github.io/oneDNN
#
# Now matmul/dense/conv2d supported by -libs=dnnl,
# and more OP patterns supported in DNNL codegen(json runtime)
#
# choices:
# - ON: Enable DNNL in BYOC and -libs=dnnl, by default using json runtime in DNNL codegen
# - JSON: same as above.
# - C_SRC: use c source runtime in DNNL codegen
# - path/to/oneDNN:oneDNN root path
# - OFF: Disable DNNL
set(USE_DNNL OFF)

# Whether use Intel AMX instructions.
set(USE_AMX OFF)

# Whether use OpenMP thread pool, choices: gnu, intel
# Note: "gnu" uses gomp library, "intel" uses iomp5 library
set(USE_OPENMP none)

# Whether use contrib.random in runtime
set(USE_RANDOM ON)

# Whether use NNPack
set(USE_NNPACK OFF)

# Possible values:
# - ON: enable tflite with cmake's find search
# - OFF: disable tflite
# - /path/to/libtensorflow-lite.a: use specific path to tensorflow lite library
set(USE_TFLITE OFF)

# /path/to/tensorflow: tensorflow root path when use tflite library
set(USE_TENSORFLOW_PATH none)

# Required for full builds with TFLite. Not needed for runtime with TFLite.
# /path/to/flatbuffers: flatbuffers root path when using tflite library
set(USE_FLATBUFFERS_PATH none)

# Possible values:
# - OFF: disable tflite support for edgetpu
# - /path/to/edgetpu: use specific path to edgetpu library
set(USE_EDGETPU OFF)

# Possible values:
# - ON: enable cuDNN with cmake's auto search in CUDA directory
# - OFF: disable cuDNN
# - /path/to/cudnn: use specific path to cuDNN path
set(USE_CUDNN OFF)

# Whether use cuBLAS
set(USE_CUBLAS OFF)

# Whether use MIOpen
set(USE_MIOPEN OFF)

# Whether use MPS
set(USE_MPS OFF)

# Whether use rocBlas
set(USE_ROCBLAS OFF)

# Whether use contrib sort
set(USE_SORT ON)

# Whether to use Arm Compute Library (ACL) codegen
# We provide 2 separate flags since we cannot build the ACL runtime on x86.
# This is useful for cases where you want to cross-compile a relay graph
# on x86 then run on AArch.
#
# An example of how to use this can be found here: docs/deploy/arm_compute_lib.rst.
#
# USE_ARM_COMPUTE_LIB - Support for compiling a relay graph offloading supported
#                       operators to Arm Compute Library. OFF/ON
# USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR - Run Arm Compute Library annotated functions via the ACL
#                                     runtime. OFF/ON/"path/to/ACL"
set(USE_ARM_COMPUTE_LIB OFF)
set(USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR OFF)

# Whether to build with Arm Ethos-N support
# Possible values:
# - OFF: disable Arm Ethos-N support
# - path/to/arm-ethos-N-stack: use a specific version of the
#   Ethos-N driver stack
set(USE_ETHOSN OFF)
# If USE_ETHOSN is enabled, use ETHOSN_HW (ON) if Ethos-N hardware is available on this machine
# otherwise use ETHOSN_HW (OFF) to use the software test infrastructure
set(USE_ETHOSN_HW OFF)

# Whether to build with Arm(R) Ethos(TM)-U NPU codegen support
set(USE_ETHOSU OFF)

# Whether to build with CMSIS-NN external library support.
# See https://github.com/ARM-software/CMSIS_5
set(USE_CMSISNN OFF)

# Whether to build with TensorRT codegen or runtime
# Examples are available here: docs/deploy/tensorrt.rst.
#
# USE_TENSORRT_CODEGEN - Support for compiling a relay graph where supported operators are
#                        offloaded to TensorRT. OFF/ON
# USE_TENSORRT_RUNTIME - Support for running TensorRT compiled modules, requires presense of
#                        TensorRT library. OFF/ON/"path/to/TensorRT"
set(USE_TENSORRT_CODEGEN OFF)
set(USE_TENSORRT_RUNTIME OFF)

# Whether use VITIS-AI codegen
set(USE_VITIS_AI OFF)

# Build Verilator codegen and runtime
set(USE_VERILATOR OFF)

#Whether to use CLML codegen
set(USE_CLML OFF)
# USE_CLML_GRAPH_EXECUTOR - CLML SDK PATH or ON or OFF
set(USE_CLML_GRAPH_EXECUTOR OFF)

# Build ANTLR parser for Relay text format
# Possible values:
# - ON: enable ANTLR by searching default locations (cmake find_program for antlr4 and /usr/local for jar)
# - OFF: disable ANTLR
# - /path/to/antlr-*-complete.jar: path to specific ANTLR jar file
set(USE_ANTLR OFF)

# Whether use Relay debug mode
set(USE_RELAY_DEBUG OFF)

# Whether to build fast VTA simulator driver
set(USE_VTA_FSIM OFF)

# Whether to build cycle-accurate VTA simulator driver
set(USE_VTA_TSIM OFF)

# Whether to build VTA FPGA driver (device side only)
set(USE_VTA_FPGA OFF)

# Whether use Thrust
set(USE_THRUST OFF)

# Whether use cuRAND
set(USE_CURAND OFF)

# Whether to build the TensorFlow TVMDSOOp module
set(USE_TF_TVMDSOOP OFF)

# Whether to build the PyTorch custom class module
set(USE_PT_TVMDSOOP OFF)

# Whether to use STL's std::unordered_map or TVM's POD compatible Map
set(USE_FALLBACK_STL_MAP OFF)

# Whether to enable Hexagon support
set(USE_HEXAGON OFF)
set(USE_HEXAGON_SDK /path/to/sdk)

# Whether to build the minimal support android rpc server for Hexagon
set(USE_HEXAGON_RPC OFF)

# Hexagon architecture to target when compiling TVM itself (not the target for
# compiling _by_ TVM). This applies to components like the TVM runtime, but is
# also used to select correct include/library paths from the Hexagon SDK when
# building runtime for Android.
# Valid values are v65, v66, v68, v69, v73.
set(USE_HEXAGON_ARCH "v68")

# Whether to use QHL library
set(USE_HEXAGON_QHL OFF)

# Whether to use ONNX codegen
set(USE_TARGET_ONNX OFF)

# Whether enable BNNS runtime
set(USE_BNNS OFF)

# Whether to build static libtvm_runtime.a, the default is to build the dynamic
# version: libtvm_runtime.so.
#
# The static runtime library needs to be linked into executables with the linker
# option --whole-archive (or its equivalent). The reason is that the TVM registry
# mechanism relies on global constructors being executed at program startup.
# Global constructors alone are not sufficient for the linker to consider a
# library member to be used, and some of such library members (object files) may
# not be included in the final executable. This would make the corresponding
# runtime functions to be unavailable to the program.
set(BUILD_STATIC_RUNTIME OFF)

# Caches the build so that building is faster when switching between branches.
# If you switch branches, build and then encounter a linking error, you may
# need to regenerate the build tree through "make .." (the cache will
# still provide significant speedups).
# Possible values:
# - AUTO: search for path to ccache, disable if not found.
# - ON: enable ccache by searching for the path to ccache, report an error if not found
# - OFF: disable ccache
# - /path/to/ccache: use specific path to ccache
set(USE_CCACHE AUTO)

# Whether to use libbacktrace to supply linenumbers on stack traces.
# Possible values:
# - ON: Find libbacktrace from system paths. Report an error if not found.
# - OFF: Don't use libbacktrace.
# - /path/to/libbacktrace: Looking for the libbacktrace header and static lib from a user-provided path. Report error if not found.
# - COMPILE: Build and link to libbacktrace from 3rdparty/libbacktrace.
# - AUTO:
#   - Find libbacktrace from system paths.
#   - If not found, fallback to COMPILE on Linux or MacOS, fallback to OFF on Windows or other platforms.
set(USE_LIBBACKTRACE AUTO)

# Whether to install a signal handler to print a backtrace on segfault.
# Need to have USE_LIBBACKTRACE enabled.
set(BACKTRACE_ON_SEGFAULT OFF)

# Whether to enable PAPI support in profiling. PAPI provides access to hardware
# counters while profiling.
# Possible values:
# - ON: enable PAPI support. Will search PKG_CONFIG_PATH for a papi.pc
# - OFF: disable PAPI support.
# - /path/to/folder/containing/: Path to folder containing papi.pc.
set(USE_PAPI OFF)

# Whether to use GoogleTest for C++ unit tests. When enabled, the generated
# build file (e.g. Makefile) will have a target "cpptest".
# Possible values:
# - ON: enable GoogleTest. The package `GTest` will be required for cmake
#   to succeed.
# - OFF: disable GoogleTest.
# - AUTO: cmake will attempt to find the GTest package, if found GTest will
#   be enabled, otherwise it will be disabled.
# Note that cmake will use `find_package` to find GTest. Please use cmake's
# predefined variables to specify the path to the GTest package if needed.
set(USE_GTEST AUTO)

# Enable using CUTLASS as a BYOC backend
# Need to have USE_CUDA=ON
set(USE_CUTLASS OFF)

# Enable to show a summary of TVM options
set(SUMMARIZE OFF)

# Whether to use LibTorch as backend
# To enable pass the path to the root libtorch (or PyTorch) directory
# OFF or /path/to/torch/
set(USE_LIBTORCH OFF)

# Whether to use the Universal Modular Accelerator Interface
set(USE_UMA OFF)

# Set custom Alloc Alignment for device allocated memory ndarray points to
set(USE_KALLOC_ALIGNMENT 64)
# set(USE_LLVM /root/clang+llvm-10.0.1-x86_64-linux-gnu-ubuntu-18.04/bin/llvm-config)
set(USE_LLVM /root/clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/llvm-config)
set(USE_CUDA /usr/local/cuda)

my config is quite simple, just enabled CUDA and LLVM.

It's weird as I can both reproduce this issue on my nvidia-4090 and amd-mi250.

my reproduce script is:

git checkout 726a1416497eeca7bfb7dcdbd799d00b33c39f79
git submodule update --init --recursive
cd build
cp ../cmake/config.cmake
echo "set(USE_LLVM /root/clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/llvm-config)" >> config.cmake
echo "set(USE_CUDA /usr/local/cuda)" >> config.cmake
cmake ..
make -j
cd ..
python -c "import tvm"

@LeiWang1999
Copy link
Contributor

Thanks @cbalint13 , BT FULL Trace: gdb.txt

@cbalint13
Copy link
Contributor

Thanks @cbalint13 , BT FULL Trace: gdb.txt

Still not able to reproduce :-(

$ rpm -q clang llvm16-devel;  git rev-parse HEAD
clang-19.1.0-1.fc42.x86_64
llvm16-devel-16.0.6-9.fc41.x86_64
726a1416497eeca7bfb7dcdbd799d00b33c39f79

$ python -c 'import tvm; print(tvm.target.codegen.llvm_version_major())'
16

$ readelf -a /usr/lib64/libtvm.so | grep NEED
  [ 4] .gnu.version_r    VERNEED          00000000001ae73c  001ae73c
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libLLVM-16.so]
 0x0000000000000001 (NEEDED)             Shared library: [libopenblas.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [ld-linux-x86-64.so.2]
 0x000000006ffffffe (VERNEED)            0x1ae73c
 0x000000006fffffff (VERNEEDNUM)         6
  • Can re-upload bt full with set print frame-arguments all to see why isllvm::RegisterTargetMachine offended ?
  • If I can't still infer the cause, an ultimate request would be to provide a Docker receipt for me to reproduce the bug.

@LeiWang1999
Copy link
Contributor

LeiWang1999 commented Oct 18, 2024

@cbalint13 , after I transfered llvm into 16.0.6, the bug disappeared, it's interesting, thanks.

16.0.1 also works for me.

@cbalint13
Copy link
Contributor

@LeiWang1999

I had only 16.0.6 (precompiled) at hand for current tests, thinking that issue might be in tvm side.
I recommend to use more recent or latest 19.x if this is possible, llvm upstream changes at steep pace.

Thank you for you patience and help !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants