Skip to content

[OpenMP] Change build of OpenMP device runtime to be a separate runtime #136729

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion offload/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,14 @@ else()
set(CMAKE_CXX_EXTENSIONS NO)
endif()

# Emit a warning for people who haven't updated their build.
if(NOT "openmp" IN_LIST RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES AND
NOT "openmp" IN_LIST RUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES)
message(WARNING "Building the offloading runtime with no device library. See "
"https://openmp.llvm.org/SupportAndFAQ.html#q-how-to-build-an-openmp-gpu-offload-capable-compiler.html "
"for more information.")
endif()

# Set the path of all resulting libraries to a unified location so that it can
# be used for testing.
set(LIBOMPTARGET_LIBRARY_DIR ${CMAKE_CURRENT_BINARY_DIR})
Expand Down Expand Up @@ -373,7 +381,6 @@ set(LIBOMPTARGET_LLVM_LIBRARY_INTDIR "${LIBOMPTARGET_INTDIR}" CACHE STRING

# Build offloading plugins and device RTLs if they are available.
add_subdirectory(plugins-nextgen)
add_subdirectory(DeviceRTL)
add_subdirectory(tools)

# Build target agnostic offloading library.
Expand Down
3 changes: 3 additions & 0 deletions offload/cmake/caches/AMDGPUBot.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,6 @@ set(LLVM_LIT_ARGS "-v --show-unsupported --timeout 100 --show-xfail -j 32" CACHE

set(CLANG_DEFAULT_LINKER "lld" CACHE STRING "")
set(CLANG_DEFAULT_RTLIB "compiler-rt" STRING "")

set(LLVM_RUNTIME_TARGETS default;amdgcn-amd-amdhsa CACHE STRING "")
set(RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES "openmp" CACHE STRING "")
4 changes: 2 additions & 2 deletions offload/cmake/caches/Offload.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,5 @@ set(LLVM_ENABLE_PER_TARGET_RUNTIME_DIR ON CACHE BOOL "")
set(LLVM_RUNTIME_TARGETS default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda CACHE STRING "")
set(RUNTIMES_nvptx64-nvidia-cuda_CACHE_FILES "${CMAKE_SOURCE_DIR}/../libcxx/cmake/caches/NVPTX.cmake" CACHE STRING "")
set(RUNTIMES_amdgcn-amd-amdhsa_CACHE_FILES "${CMAKE_SOURCE_DIR}/../libcxx/cmake/caches/AMDGPU.cmake" CACHE STRING "")
set(RUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;libcxx;libcxxabi" CACHE STRING "")
set(RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;libcxx;libcxxabi" CACHE STRING "")
set(RUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;openmp;libcxx;libcxxabi" CACHE STRING "")
set(RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;openmp;libcxx;libcxxabi" CACHE STRING "")
66 changes: 40 additions & 26 deletions openmp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,14 @@ else()
set(CMAKE_CXX_EXTENSIONS NO)
endif()

# Targeting the GPU directly requires a few flags to make CMake happy.
if("${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^amdgcn")
set(CMAKE_REQUIRED_FLAGS "${CMAKE_REQUIRED_FLAGS} -nogpulib")
elseif("${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^nvptx")
set(CMAKE_REQUIRED_FLAGS
"${CMAKE_REQUIRED_FLAGS} -flto -c -Wno-unused-command-line-argument")
endif()

# Check and set up common compiler flags.
include(config-ix)
include(HandleOpenMPOptions)
Expand Down Expand Up @@ -122,35 +130,41 @@ else()
get_clang_resource_dir(LIBOMP_HEADERS_INSTALL_PATH SUBDIR include)
endif()

# Build host runtime library, after LIBOMPTARGET variables are set since they are needed
# to enable time profiling support in the OpenMP runtime.
add_subdirectory(runtime)

set(ENABLE_OMPT_TOOLS ON)
# Currently tools are not tested well on Windows or MacOS X.
if (APPLE OR WIN32)
set(ENABLE_OMPT_TOOLS OFF)
endif()
# Use the current compiler target to determine the appropriate runtime to build.
if("${LLVM_DEFAULT_TARGET_TRIPLE}" MATCHES "^amdgcn|^nvptx" OR
"${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^amdgcn|^nvptx")
add_subdirectory(device)
Comment on lines +134 to +136
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[serious] What happens with host offloading? They also need device-like functions such as omp_get_device_num(). The device-side implementation and host-side implementation are different. This also matter when e.g. offloading to a remote cluster (non-GPU) node via MPI.

I don't think we should (or can) assume that the triple determines whether it is executing on the host or device.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Host offloading uses 'libomp.so'. The way I think about it is that this 'ompdeviceis basicallylibomp` for GPUs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The device-side omp_get_device_num() (defined in libomptarget.so, not libomp.so) only returns omp_get_initial_device(), which is wrong for any kind of offloading.

After trying out what actuall happens I found that it actually executes the Fortran wrapper (in libomp.so). It also incorrectly assumes it is always executing on the host. That looks like a bug.

else()
# Build host runtime library, after LIBOMPTARGET variables are set since they
# are needed to enable time profiling support in the OpenMP runtime.
add_subdirectory(runtime)

set(ENABLE_OMPT_TOOLS ON)
# Currently tools are not tested well on Windows or MacOS X.
if (APPLE OR WIN32)
set(ENABLE_OMPT_TOOLS OFF)
endif()

option(OPENMP_ENABLE_OMPT_TOOLS "Enable building ompt based tools for OpenMP."
${ENABLE_OMPT_TOOLS})
if (OPENMP_ENABLE_OMPT_TOOLS)
add_subdirectory(tools)
endif()
option(OPENMP_ENABLE_OMPT_TOOLS "Enable building ompt based tools for OpenMP."
${ENABLE_OMPT_TOOLS})
if (OPENMP_ENABLE_OMPT_TOOLS)
add_subdirectory(tools)
endif()

# Propagate OMPT support to offload
if(NOT ${OPENMP_STANDALONE_BUILD})
set(LIBOMP_HAVE_OMPT_SUPPORT ${LIBOMP_HAVE_OMPT_SUPPORT} PARENT_SCOPE)
set(LIBOMP_OMP_TOOLS_INCLUDE_DIR ${LIBOMP_OMP_TOOLS_INCLUDE_DIR} PARENT_SCOPE)
endif()
# Propagate OMPT support to offload
if(NOT ${OPENMP_STANDALONE_BUILD})
set(LIBOMP_HAVE_OMPT_SUPPORT ${LIBOMP_HAVE_OMPT_SUPPORT} PARENT_SCOPE)
set(LIBOMP_OMP_TOOLS_INCLUDE_DIR ${LIBOMP_OMP_TOOLS_INCLUDE_DIR} PARENT_SCOPE)
endif()

option(OPENMP_MSVC_NAME_SCHEME "Build dll with MSVC naming scheme." OFF)
option(OPENMP_MSVC_NAME_SCHEME "Build dll with MSVC naming scheme." OFF)

# Build libompd.so
add_subdirectory(libompd)
# Build libompd.so
add_subdirectory(libompd)

# Build documentation
add_subdirectory(docs)
# Build documentation
add_subdirectory(docs)

# Now that we have seen all testsuites, create the check-openmp target.
construct_check_openmp_target()
# Now that we have seen all testsuites, create the check-openmp target.
construct_check_openmp_target()
endif()
99 changes: 99 additions & 0 deletions openmp/device/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Ensure the compiler is a valid clang when building the GPU target.
set(req_ver "${LLVM_VERSION_MAJOR}.${LLVM_VERSION_MINOR}.${LLVM_VERSION_PATCH}")
if(LLVM_VERSION_MAJOR AND NOT (CMAKE_CXX_COMPILER_ID MATCHES "[Cc]lang" AND
${CMAKE_CXX_COMPILER_VERSION} VERSION_EQUAL "${req_ver}"))
message(FATAL_ERROR "Cannot build GPU device runtime. CMake compiler "
"'${CMAKE_CXX_COMPILER_ID} ${CMAKE_CXX_COMPILER_VERSION}' "
" is not 'Clang ${req_ver}'.")
endif()

set(src_files
${CMAKE_CURRENT_SOURCE_DIR}/src/Allocator.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/Configuration.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/Debug.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/Kernel.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/LibC.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/Mapping.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/Misc.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/Parallelism.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/Profiling.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/Reduction.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/State.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/Synchronization.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/Tasking.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/DeviceUtils.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/Workshare.cpp
)

list(APPEND compile_options -flto)
list(APPEND compile_options -fvisibility=hidden)
list(APPEND compile_options -nogpulib)
list(APPEND compile_options -nostdlibinc)
list(APPEND compile_options -fno-rtti)
list(APPEND compile_options -fno-exceptions)
list(APPEND compile_options -fconvergent-functions)
list(APPEND compile_options -Wno-unknown-cuda-version)
if(LLVM_DEFAULT_TARGET_TRIPLE)
list(APPEND compile_options --target=${LLVM_DEFAULT_TARGET_TRIPLE})
endif()

# We disable the slp vectorizer during the runtime optimization to avoid
# vectorized accesses to the shared state. Generally, those are "good" but
# the optimizer pipeline (esp. Attributor) does not fully support vectorized
# instructions yet and we end up missing out on way more important constant
# propagation. That said, we will run the vectorizer again after the runtime
# has been linked into the user program.
list(APPEND compile_flags "SHELL: -mllvm -vectorize-slp=false")
if("${LLVM_DEFAULT_TARGET_TRIPLE}" MATCHES "^amdgcn" OR
"${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^amdgcn")
set(target_name "amdgpu")
list(APPEND compile_flags "SHELL:-Xclang -mcode-object-version=none")
elseif("${LLVM_DEFAULT_TARGET_TRIPLE}" MATCHES "^nvptx" OR
"${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^nvptx")
set(target_name "nvptx")
list(APPEND compile_flags --cuda-feature=+ptx63)
endif()

# Trick to combine these into a bitcode file via the linker's LTO pass.
add_executable(libompdevice ${src_files})
set_target_properties(libompdevice PROPERTIES
RUNTIME_OUTPUT_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
LINKER_LANGUAGE CXX
BUILD_RPATH ""
INSTALL_RPATH ""
RUNTIME_OUTPUT_NAME libomptarget-${target_name}.bc)

# If the user built with the GPU C library enabled we will use that instead.
if(LIBOMPTARGET_GPU_LIBC_SUPPORT)
target_compile_definitions(libompdevice PRIVATE OMPTARGET_HAS_LIBC)
endif()
target_compile_definitions(libompdevice PRIVATE SHARED_SCRATCHPAD_SIZE=512)

target_include_directories(libompdevice PRIVATE
${CMAKE_CURRENT_SOURCE_DIR}/include
${CMAKE_CURRENT_SOURCE_DIR}/../../libc
${CMAKE_CURRENT_SOURCE_DIR}/../../offload/include)
target_compile_options(libompdevice PRIVATE ${compile_options})
target_link_options(libompdevice PRIVATE
"-flto" "-r" "-nostdlib" "-Wl,--lto-emit-llvm")
if(LLVM_DEFAULT_TARGET_TRIPLE)
target_link_options(libompdevice PRIVATE "--target=${LLVM_DEFAULT_TARGET_TRIPLE}")
endif()
install(TARGETS libompdevice
PERMISSIONS OWNER_WRITE OWNER_READ GROUP_READ WORLD_READ
DESTINATION ${OPENMP_INSTALL_LIBDIR})

add_library(ompdevice.all_objs OBJECT IMPORTED)
set_property(TARGET ompdevice.all_objs APPEND PROPERTY IMPORTED_OBJECTS
${CMAKE_CURRENT_BINARY_DIR}/libomptarget-${target_name}.bc)

# Archive all the object files generated above into a static library
add_library(ompdevice STATIC)
add_dependencies(ompdevice libompdevice)
set_target_properties(ompdevice PROPERTIES
ARCHIVE_OUTPUT_DIRECTORY "${OPENMP_INSTALL_LIBDIR}"
ARCHIVE_OUTPUT_NAME ompdevice
LINKER_LANGUAGE CXX
)
target_link_libraries(ompdevice PRIVATE ompdevice.all_objs)
install(TARGETS ompdevice ARCHIVE DESTINATION "${OPENMP_INSTALL_LIBDIR}")
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
7 changes: 7 additions & 0 deletions openmp/docs/SupportAndFAQ.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,13 @@ Clang will be built with all backends enabled. When building with
``LLVM_ENABLE_RUNTIMES="openmp"`` OpenMP should not be enabled in
``LLVM_ENABLE_PROJECTS`` because it is enabled by default.

Support for the device library comes from a separate build of the OpenMP library
that targets the GPU architecture. Building it requires enabling the runtime
targets, or setting the target manually when doing a standalone build. This is
done with the ``LLVM_RUNTIME_TARGETS`` option and then enabling the OpenMP
runtime for the GPU target. ``RUNTIMES_<triple>_LLVM_ENABLE_RUNTIMES``. Refer to
the cache file for the specific invocation.

For Nvidia offload, please see :ref:`build_nvidia_offload_capable_compiler`.
For AMDGPU offload, please see :ref:`build_amdgpu_offload_capable_compiler`.

Expand Down
Loading