Skip to content

[Comgr] Embedded libc++ headers conflict with system libstdc++ on RHEL/manylinux #2445

@lamb-j

Description

@lamb-j

Summary

Comgr's COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC action injects embedded libc++ headers at -idirafter (lowest priority), assuming any system libstdc++ that's present will work on its own and the embedded headers are only a fallback. On RHEL / manylinux (and any environment with gcc-toolset's libstdc++ installed at clang's default search path), this assumption breaks for HIP --offload-device-only compilation.

Symptom

Any HIP source compiled via the comgr API that includes a C++ stdlib header (e.g. <tuple>, <exception>, <stdexcept>, <array>) fails with:

include/c++/v1/stddef.h:39:15: fatal error: 'stddef.h' file not found
   39 | #include_next <stddef.h>

Reproduced concretely in compile_hip_with_libcxx_test running inside ghcr.io/rocm/therock_build_manylinux_x86_64. PR #2444 fixes the test by adding -nostdinc++, but the underlying behavior still affects external users of the comgr API who are unaware they need that flag.

Root cause chain

  1. Source #include <tuple>.
  2. Clang resolves <tuple> to system gcc libstdc++ (e.g. /usr/lib/gcc/x86_64-redhat-linux/8/.../include/c++/8/tuple) because the embedded libc++ is injected at -idirafter.
  3. gcc's <tuple> -> <array> -> <stdexcept> -> <exception> -> <bits/exception_ptr.h> -> <bits/cxxabi_init_exception.h> -> #include <stddef.h>.
  4. That <stddef.h> resolves to libc++'s include/c++/v1/stddef.h (VFS-mapped to clang's resource dir).
  5. libc++'s stddef.h does #include_next <stddef.h> -- under HIP --offload-device-only with -nogpuinc, no next stddef.h exists on the device include path.

Where it lives

amd/comgr/src/comgr-compiler.cpp:1204-1216:

// Auto-inject embedded libc++ headers as a fallback include path.
// Using -idirafter places them AFTER all other include paths, so:
//   - System libstdc++ or libc++ headers take priority when available
//   - User-provided -I paths take priority
//   - Embedded headers only kick in when no other C++ headers are found
// This ensures backward compatibility while providing headers on systems
// without C++ development headers (e.g., driver-only installs).
if (HasEmbeddedHeaders && getLanguage() == AMD_COMGR_LANGUAGE_HIP) {
  SmallString<256> LibcxxPath(env::getLLVMPath());
  sys::path::append(LibcxxPath, "include", "c++", "v1");
  Argv.push_back("-idirafter");
  Argv.push_back(Saver.save(StringRef(LibcxxPath)).data());
}

The comment explicitly assumes "system libstdc++ takes priority when available" is the right default. That assumption is wrong for HIP device-only mode on systems where system libstdc++ headers transitively include <stddef.h> and the chain hits a libc++ stddef that wants #include_next.

Possible fixes

  1. Auto-inject -nostdinc++ when -nogpuinc is set and embedded headers are active. Targeted: only HIP device-only mode would change behavior. Users who want system libstdc++ would have to explicitly drop -nogpuinc or pass their own -I paths.

  2. Change -idirafter -> -isystem for embedded libc++. Embedded headers always take priority. Risk: users who actually depend on system libstdc++ (and don't use -nogpuinc) would silently switch to embedded.

  3. Document the requirement -- tell users they must pass -nostdinc++ themselves when using -nogpuinc. Lowest blast radius, also least helpful.

Reproducer

docker run --rm -v /path/to/llvm-project:/llvm:ro -v /tmp/repro:/repro -w /repro \
  ghcr.io/rocm/therock_build_manylinux_x86_64:latest bash -c '
    cmake -G Ninja -S /llvm/llvm -B build -DCMAKE_BUILD_TYPE=Release \
      -DLLVM_ENABLE_PROJECTS="clang;lld" \
      -DLLVM_TARGETS_TO_BUILD="AMDGPU;X86;SPIRV"
    ninja -C build llvm-test-depends clang-test-depends
    cmake -G Ninja -S /llvm/amd/device-libs -B build-device-libs \
      -DCMAKE_PREFIX_PATH=/repro/build -DLLVM_DIR=/repro/build/lib/cmake/llvm
    ninja -C build-device-libs
    cmake -G Ninja -S /llvm/amd/comgr -B build-comgr \
      -DCMAKE_PREFIX_PATH="/repro/build;/repro/build-device-libs" \
      -DLLVM_DIR=/repro/build/lib/cmake/llvm \
      -DBUILD_TESTING=ON
    ninja -C build-comgr compile_hip_with_libcxx_test
    cd build-comgr && LD_LIBRARY_PATH=. AMD_COMGR_REDIRECT_LOGS=stderr \
      ./test/compile_hip_with_libcxx_test
  '

(AMD_COMGR_REDIRECT_LOGS=stderr is essential -- without it, the failure surfaces only as a generic AMD_COMGR_STATUS_ERROR with no detail. This is itself worth surfacing in test infrastructure.)

Related

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions