Rudimentary CUDA support infrastructure (AcademySoftwareFoundation#4293)

lgritz · scott-wilson · commit 15fe22e4fa46 · 2025-05-18T10:06:52.000-07:00
Build-time check for Cuda toolkit as optional dependency. This can be
explicitly disabled with OIIO_USE_CUDA=OFF (though currently, it
defaults to off, you have to turn it on at build time because this is
all still experimental). When this is enabled at build time, C++
preprocessor symbol OIIO_USE_CUDA will be defined within the OIIO
codebase.

OIIO global `attribute("gpu:device", "CUDA")` can be used to enable CUDA
functionality, which at this point doesn't do anything other than
initialize the CUDA device and find out information about it. Several
global `getattribute("cuda:*")` queries let you find out some
information about any device found.

I'm trying to keep a lot of the approaches fairly generic, so that the
interface won't need to change much to allow OpenCL or Metal or whatever
else comes along, though for now I'm trying to flesh out the
implementation mostly with CUDA.

oiiotool adds a new `--cuda` option, reserved for enabling CUDA
functionality. Currently the only way this is used is to print
information about the CUDA device in the `oiiotool --help` messages. An
example of that is:

```
$ oiiotool --cuda --help
oiiotool -- simple image processing operations
...

OIIO 2.6.2.0spi | Linux/x86_64
    Build compiler: gcc 9.3 | C++17/201703
    HW features enabled at build: sse2,sse3,ssse3,sse41,sse42
    CUDA 11.6.0 support enabled at build time
...
Running on 32 cores 187.3GB sse2,sse3,ssse3,sse41,sse42,avx,avx2,avx512f,
    avx512dq,avx512cd,avx512bw,avx512vl,fma,f16c,popcnt,rdrand
Compute hardware available: CUDA on Quadro RTX 6000, driver 12020,
    runtime 11060, compat 705, memory 23.6 GB
```

Note that at this point, we aren't actually *using* CUDA for anything.
This is just setting up some basics for future expansion. It's all
hidden in the pvt namespace for internal use only, and any of the
functions or nomenclature may change as we continue to add
functionality. I want to iterate on it a bit before it's in any way
exposed in public headers or interfaces.

I'm a big fan of introducing even big strategic changes in the form of a
series of MVP easy to review discrete steps. This step's only goals are:
(a) allowing the build system to find and link against the CUDA Toolkit;
(b) allowing runtime initialization and querying of a few basic facts
about the device found; (c) start noodling around just a bit with
function and attribute interfaces to gain a little experience that will
guide what we eventually want.

LONG TERM goals of this initiative include:

* A full TextureSystem work-alike that can be used from CUDA (and
possibly other compute device APIs) based renderers and with OSL's OptiX
back and, having an analogous (ideally identical) interface and as close
as possible to feature parity with CPU TextureSystem.

* Augment ImageBuf with the ability to store its images in GPU (or
unified) memory and for IBA functions to be able to operate on those
GPU-side buffers with GPU compute kernels, somewhat analogously to how
libraries like PyTorch provides optional GPU support for tensor
operations. (The hope is that this improves impage processing
performance, though it's possible that it'll turn out that typical OIIO
image processing workflows are so I/O dominated that there's not a lot
of potential upside. I think we'll have to try before we really find out
if it's going to be worth it.)

* Allow image reading/writing plugins (or other parts of OIIO
functionality) to leverage a GPU if available, and if there is something
productive they can do with it.

---------

Signed-off-by: Larry Gritz &lt;lg@larrygritz.com&gt;
Signed-off-by: Scott Wilson &lt;scott@propersquid.com&gt;
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -177,6 +177,8 @@ include (pythonutils)
 # Dependency finding utilities and all dependency-related options
 include (externalpackages)
 
+include (cuda_macros)
+
 # Include all our testing apparatus and utils, but not if it's a subproject
 if (PROJECT_IS_TOP_LEVEL)
     include (testing)
diff --git a/src/build-scripts/ci-startup.bash b/src/build-scripts/ci-startup.bash
@@ -72,7 +72,8 @@ fi
 export PAR_MAKEFLAGS=-j${PARALLEL}
 export CMAKE_BUILD_PARALLEL_LEVEL=${CMAKE_BUILD_PARALLEL_LEVEL:=${PARALLEL}}
 export CTEST_PARALLEL_LEVEL=${CTEST_PARALLEL_LEVEL:=${PARALLEL}}
-
+export OIIO_USE_CUDA=1
+export CUDAToolkit_ROOT=/usr/local/cuda
 
 mkdir -p build dist
 
diff --git a/src/cmake/compiler.cmake b/src/cmake/compiler.cmake
@@ -37,6 +37,12 @@ message (STATUS "Building with C++${CMAKE_CXX_STANDARD}, downstream minimum C++$
 if (CMAKE_CXX_STANDARD VERSION_LESS CMAKE_CXX_MINIMUM)
     message (FATAL_ERROR "C++${CMAKE_CXX_STANDARD} is not supported, minimum is C++${CMAKE_CXX_MINIMUM}")
 endif ()
+# Remember the -std flags we need will be used later for custom Cuda builds
+set (CSTD_FLAGS "")
+if (CMAKE_COMPILER_IS_GNUCC OR CMAKE_COMPILER_IS_CLANG OR CMAKE_COMPILER_IS_INTEL)
+    set (CSTD_FLAGS "-std=c++${CMAKE_CXX_STANDARD}")
+endif ()
+
 
 ###########################################################################
 # Figure out which compiler we're using
@@ -219,7 +225,6 @@ if (CMAKE_COMPILER_IS_GNUCC OR CMAKE_COMPILER_IS_CLANG)
     add_compile_options ("-fno-math-errno")
 endif ()
 
-
 # We will use this for ccache and timing
 set (MY_RULE_LAUNCH "")
 
diff --git a/src/cmake/cuda_macros.cmake b/src/cmake/cuda_macros.cmake
@@ -0,0 +1,36 @@
+# Copyright Contributors to the OpenImageIO project.
+# SPDX-License-Identifier: Apache-2.0
+# https://github.com/AcademySoftwareFoundation/OpenImageIO
+
+
+set_option (OIIO_USE_CUDA "Include Cuda support if found" OFF)
+set_cache (CUDA_TARGET_ARCH "sm_60" "CUDA GPU architecture (e.g. sm_60)")
+set_cache (CUDAToolkit_ROOT "" "Path to CUDA toolkit")
+
+if (OIIO_USE_CUDA)
+    if (OIIO_USE_CUDA AND CMAKE_VERSION VERSION_LESS 3.18)
+        message (WARNING "CMake >= 3.18 is required to correctly find the CUDA dependency")
+    endif ()
+    set (CUDA_PROPAGATE_HOST_FLAGS ON)
+    set (CUDA_VERBOSE_BUILD ${VERBOSE})
+    checked_find_package(CUDAToolkit
+                         VERSION_MIN 9.0
+                         RECOMMEND_MIN 11.0
+                         RECOMMEND_MIN_REASON
+                            "We don't actively test CUDA older than 11"
+                         )
+    list (APPEND CUDA_NVCC_FLAGS ${CSTD_FLAGS} -expt-relaxed-constexpr)
+    if (CUDAToolkit_FOUND)
+        add_compile_definitions (OIIO_USE_CUDA=1)
+    endif ()
+endif ()
+
+
+# Add necessary ingredients to make `target` include and link against Cuda.
+function (oiio_cuda_target target)
+    if (CUDAToolkit_FOUND)
+        target_link_libraries (${target} PRIVATE
+                               CUDA::cudart_static
+                              )
+    endif ()
+endfunction()
diff --git a/src/doc/oiiotool.rst b/src/doc/oiiotool.rst
@@ -929,6 +929,13 @@ output each one to a different file, with names `sub0001.tif`,
     default (also if n=0) is to use as many threads as there are cores
     present in the hardware.
 
+.. option:: --gpu <n>
+
+    EXPERIMENTAL: Enable a GPU or other compute acceleration device, if
+    available.
+
+    This was added in OIIO 3.0.
+
 .. option:: --cache <size>
 
     Causes images to be read through an ImageCache and set the underlying
diff --git a/src/include/imageio_pvt.h b/src/include/imageio_pvt.h
@@ -59,6 +59,7 @@ OIIO_API const std::vector<std::string>&
 font_list();
 
 
+
 // For internal use - use error() below for a nicer interface.
 void
 append_error(string_view message);
@@ -230,6 +231,53 @@ OIIO_API bool
 print_stats(std::ostream& out, string_view indent, const ImageBuf& input,
             const ImageSpec& spec, ROI roi, std::string& err);
 
+
+enum class ComputeDevice : int {
+    CPU  = 0,
+    CUDA = 1,
+    // Might expand later...
+};
+
+// Which compute device is currently active, and should be used by any
+// OIIO facilities that know how to use it.
+OIIO_API ComputeDevice
+compute_device();
+
+#if 0
+/// Return true if CUDA is available to OpenImageIO at this time -- support
+/// enabled at build time, and has already been turned on with enable_cuda()
+/// or with OIIO::attribute("cuda", 1), and hardware is present and was
+/// successfully initialized.
+OIIO_API bool
+openimageio_cuda();
+#endif
+
+// Set an attribute related to OIIO's use of GPUs/compute devices. This is a
+// strictly internal function. User code should just call OIIO::attribute()
+// and GPU-related attributes will be directed here automatically.
+OIIO_API bool
+gpu_attribute(string_view name, TypeDesc type, const void* val);
+
+// Retrieve an attribute related to OIIO's use of GPUs/compute devices. This
+// is a strictly internal function. User code should just call
+// OIIO::getattribute() and GPU-related attributes will be directed here
+// automatically.
+OIIO_API bool
+gpu_getattribute(string_view name, TypeDesc type, void* val);
+
+
+/// Allocate compute device memory
+OIIO_API void*
+device_malloc(size_t size);
+
+/// Allocate unified compute device memory -- visible on both CPU & GPU
+OIIO_API void*
+device_unified_malloc(size_t size);
+
+/// Free compute device memory
+OIIO_API void
+device_free(void* mem);
+
 }  // namespace pvt
 
 OIIO_NAMESPACE_END
diff --git a/src/libOpenImageIO/CMakeLists.txt b/src/libOpenImageIO/CMakeLists.txt
@@ -66,6 +66,7 @@ set (libOpenImageIO_srcs
                           maketexture.cpp
                           bluenoise.cpp
                           printinfo.cpp
+                          oiio_gpu.cpp
                           ../libtexture/texturesys.cpp
                           ../libtexture/texture3d.cpp
                           ../libtexture/environment.cpp
@@ -175,6 +176,8 @@ if (MINGW)
     target_link_libraries (OpenImageIO PRIVATE ws2_32)
 endif()
 
+oiio_cuda_target (OpenImageIO)
+
 file (GLOB iba_sources "imagebufalgo_*.cpp")
 if (MSVC)
     # In some MSVC setups, the IBA functions with huge template expansions
diff --git a/src/libOpenImageIO/imageio.cpp b/src/libOpenImageIO/imageio.cpp
@@ -386,6 +386,12 @@ attribute(string_view name, TypeDesc type, const void* val)
         default_thread_pool()->resize(ot - 1);
         return true;
     }
+    if (Strutil::starts_with(name, "gpu:")
+        || Strutil::starts_with(name, "cuda:")) {
+        return pvt::gpu_attribute(name, type, val);
+    }
+
+    // Things below here need to buarded by the attrib_mutex
     spin_lock lock(attrib_mutex);
     if (name == "read_chunk" && type == TypeInt) {
         oiio_read_chunk = *(const int*)val;
@@ -485,6 +491,12 @@ getattribute(string_view name, TypeDesc type, void* val)
         *(ustring*)val = ustring(OIIO_VERSION_STRING);
         return true;
     }
+    if (Strutil::starts_with(name, "gpu:")
+        || Strutil::starts_with(name, "cuda:")) {
+        return pvt::gpu_getattribute(name, type, val);
+    }
+
+    // Things below here need to buarded by the attrib_mutex
     spin_lock lock(attrib_mutex);
     if (name == "read_chunk" && type == TypeInt) {
         *(int*)val = oiio_read_chunk;
diff --git a/src/libOpenImageIO/oiio_gpu.cpp b/src/libOpenImageIO/oiio_gpu.cpp
diff --git a/src/oiiotool/oiiotool.cpp b/src/oiiotool/oiiotool.cpp