Skip to content

Commit 15fe22e

Browse files
lgritzscott-wilson
authored andcommitted
Rudimentary CUDA support infrastructure (AcademySoftwareFoundation#4293)
Build-time check for Cuda toolkit as optional dependency. This can be explicitly disabled with OIIO_USE_CUDA=OFF (though currently, it defaults to off, you have to turn it on at build time because this is all still experimental). When this is enabled at build time, C++ preprocessor symbol OIIO_USE_CUDA will be defined within the OIIO codebase. OIIO global `attribute("gpu:device", "CUDA")` can be used to enable CUDA functionality, which at this point doesn't do anything other than initialize the CUDA device and find out information about it. Several global `getattribute("cuda:*")` queries let you find out some information about any device found. I'm trying to keep a lot of the approaches fairly generic, so that the interface won't need to change much to allow OpenCL or Metal or whatever else comes along, though for now I'm trying to flesh out the implementation mostly with CUDA. oiiotool adds a new `--cuda` option, reserved for enabling CUDA functionality. Currently the only way this is used is to print information about the CUDA device in the `oiiotool --help` messages. An example of that is: ``` $ oiiotool --cuda --help oiiotool -- simple image processing operations ... OIIO 2.6.2.0spi | Linux/x86_64 Build compiler: gcc 9.3 | C++17/201703 HW features enabled at build: sse2,sse3,ssse3,sse41,sse42 CUDA 11.6.0 support enabled at build time ... Running on 32 cores 187.3GB sse2,sse3,ssse3,sse41,sse42,avx,avx2,avx512f, avx512dq,avx512cd,avx512bw,avx512vl,fma,f16c,popcnt,rdrand Compute hardware available: CUDA on Quadro RTX 6000, driver 12020, runtime 11060, compat 705, memory 23.6 GB ``` Note that at this point, we aren't actually *using* CUDA for anything. This is just setting up some basics for future expansion. It's all hidden in the pvt namespace for internal use only, and any of the functions or nomenclature may change as we continue to add functionality. I want to iterate on it a bit before it's in any way exposed in public headers or interfaces. I'm a big fan of introducing even big strategic changes in the form of a series of MVP easy to review discrete steps. This step's only goals are: (a) allowing the build system to find and link against the CUDA Toolkit; (b) allowing runtime initialization and querying of a few basic facts about the device found; (c) start noodling around just a bit with function and attribute interfaces to gain a little experience that will guide what we eventually want. LONG TERM goals of this initiative include: * A full TextureSystem work-alike that can be used from CUDA (and possibly other compute device APIs) based renderers and with OSL's OptiX back and, having an analogous (ideally identical) interface and as close as possible to feature parity with CPU TextureSystem. * Augment ImageBuf with the ability to store its images in GPU (or unified) memory and for IBA functions to be able to operate on those GPU-side buffers with GPU compute kernels, somewhat analogously to how libraries like PyTorch provides optional GPU support for tensor operations. (The hope is that this improves impage processing performance, though it's possible that it'll turn out that typical OIIO image processing workflows are so I/O dominated that there's not a lot of potential upside. I think we'll have to try before we really find out if it's going to be worth it.) * Allow image reading/writing plugins (or other parts of OIIO functionality) to leverage a GPU if available, and if there is something productive they can do with it. --------- Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Scott Wilson <scott@propersquid.com>
1 parent c87a290 commit 15fe22e

File tree

10 files changed

+446
-6
lines changed

10 files changed

+446
-6
lines changed

CMakeLists.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,8 @@ include (pythonutils)
177177
# Dependency finding utilities and all dependency-related options
178178
include (externalpackages)
179179

180+
include (cuda_macros)
181+
180182
# Include all our testing apparatus and utils, but not if it's a subproject
181183
if (PROJECT_IS_TOP_LEVEL)
182184
include (testing)

src/build-scripts/ci-startup.bash

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,8 @@ fi
7272
export PAR_MAKEFLAGS=-j${PARALLEL}
7373
export CMAKE_BUILD_PARALLEL_LEVEL=${CMAKE_BUILD_PARALLEL_LEVEL:=${PARALLEL}}
7474
export CTEST_PARALLEL_LEVEL=${CTEST_PARALLEL_LEVEL:=${PARALLEL}}
75-
75+
export OIIO_USE_CUDA=1
76+
export CUDAToolkit_ROOT=/usr/local/cuda
7677

7778
mkdir -p build dist
7879

src/cmake/compiler.cmake

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,12 @@ message (STATUS "Building with C++${CMAKE_CXX_STANDARD}, downstream minimum C++$
3737
if (CMAKE_CXX_STANDARD VERSION_LESS CMAKE_CXX_MINIMUM)
3838
message (FATAL_ERROR "C++${CMAKE_CXX_STANDARD} is not supported, minimum is C++${CMAKE_CXX_MINIMUM}")
3939
endif ()
40+
# Remember the -std flags we need will be used later for custom Cuda builds
41+
set (CSTD_FLAGS "")
42+
if (CMAKE_COMPILER_IS_GNUCC OR CMAKE_COMPILER_IS_CLANG OR CMAKE_COMPILER_IS_INTEL)
43+
set (CSTD_FLAGS "-std=c++${CMAKE_CXX_STANDARD}")
44+
endif ()
45+
4046

4147
###########################################################################
4248
# Figure out which compiler we're using
@@ -219,7 +225,6 @@ if (CMAKE_COMPILER_IS_GNUCC OR CMAKE_COMPILER_IS_CLANG)
219225
add_compile_options ("-fno-math-errno")
220226
endif ()
221227

222-
223228
# We will use this for ccache and timing
224229
set (MY_RULE_LAUNCH "")
225230

src/cmake/cuda_macros.cmake

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Copyright Contributors to the OpenImageIO project.
2+
# SPDX-License-Identifier: Apache-2.0
3+
# https://github.com/AcademySoftwareFoundation/OpenImageIO
4+
5+
6+
set_option (OIIO_USE_CUDA "Include Cuda support if found" OFF)
7+
set_cache (CUDA_TARGET_ARCH "sm_60" "CUDA GPU architecture (e.g. sm_60)")
8+
set_cache (CUDAToolkit_ROOT "" "Path to CUDA toolkit")
9+
10+
if (OIIO_USE_CUDA)
11+
if (OIIO_USE_CUDA AND CMAKE_VERSION VERSION_LESS 3.18)
12+
message (WARNING "CMake >= 3.18 is required to correctly find the CUDA dependency")
13+
endif ()
14+
set (CUDA_PROPAGATE_HOST_FLAGS ON)
15+
set (CUDA_VERBOSE_BUILD ${VERBOSE})
16+
checked_find_package(CUDAToolkit
17+
VERSION_MIN 9.0
18+
RECOMMEND_MIN 11.0
19+
RECOMMEND_MIN_REASON
20+
"We don't actively test CUDA older than 11"
21+
)
22+
list (APPEND CUDA_NVCC_FLAGS ${CSTD_FLAGS} -expt-relaxed-constexpr)
23+
if (CUDAToolkit_FOUND)
24+
add_compile_definitions (OIIO_USE_CUDA=1)
25+
endif ()
26+
endif ()
27+
28+
29+
# Add necessary ingredients to make `target` include and link against Cuda.
30+
function (oiio_cuda_target target)
31+
if (CUDAToolkit_FOUND)
32+
target_link_libraries (${target} PRIVATE
33+
CUDA::cudart_static
34+
)
35+
endif ()
36+
endfunction()

src/doc/oiiotool.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -929,6 +929,13 @@ output each one to a different file, with names `sub0001.tif`,
929929
default (also if n=0) is to use as many threads as there are cores
930930
present in the hardware.
931931

932+
.. option:: --gpu <n>
933+
934+
EXPERIMENTAL: Enable a GPU or other compute acceleration device, if
935+
available.
936+
937+
This was added in OIIO 3.0.
938+
932939
.. option:: --cache <size>
933940

934941
Causes images to be read through an ImageCache and set the underlying

src/include/imageio_pvt.h

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ OIIO_API const std::vector<std::string>&
5959
font_list();
6060

6161

62+
6263
// For internal use - use error() below for a nicer interface.
6364
void
6465
append_error(string_view message);
@@ -230,6 +231,53 @@ OIIO_API bool
230231
print_stats(std::ostream& out, string_view indent, const ImageBuf& input,
231232
const ImageSpec& spec, ROI roi, std::string& err);
232233

234+
235+
enum class ComputeDevice : int {
236+
CPU = 0,
237+
CUDA = 1,
238+
// Might expand later...
239+
};
240+
241+
// Which compute device is currently active, and should be used by any
242+
// OIIO facilities that know how to use it.
243+
OIIO_API ComputeDevice
244+
compute_device();
245+
246+
#if 0
247+
/// Return true if CUDA is available to OpenImageIO at this time -- support
248+
/// enabled at build time, and has already been turned on with enable_cuda()
249+
/// or with OIIO::attribute("cuda", 1), and hardware is present and was
250+
/// successfully initialized.
251+
OIIO_API bool
252+
openimageio_cuda();
253+
#endif
254+
255+
// Set an attribute related to OIIO's use of GPUs/compute devices. This is a
256+
// strictly internal function. User code should just call OIIO::attribute()
257+
// and GPU-related attributes will be directed here automatically.
258+
OIIO_API bool
259+
gpu_attribute(string_view name, TypeDesc type, const void* val);
260+
261+
// Retrieve an attribute related to OIIO's use of GPUs/compute devices. This
262+
// is a strictly internal function. User code should just call
263+
// OIIO::getattribute() and GPU-related attributes will be directed here
264+
// automatically.
265+
OIIO_API bool
266+
gpu_getattribute(string_view name, TypeDesc type, void* val);
267+
268+
269+
/// Allocate compute device memory
270+
OIIO_API void*
271+
device_malloc(size_t size);
272+
273+
/// Allocate unified compute device memory -- visible on both CPU & GPU
274+
OIIO_API void*
275+
device_unified_malloc(size_t size);
276+
277+
/// Free compute device memory
278+
OIIO_API void
279+
device_free(void* mem);
280+
233281
} // namespace pvt
234282

235283
OIIO_NAMESPACE_END

src/libOpenImageIO/CMakeLists.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ set (libOpenImageIO_srcs
6666
maketexture.cpp
6767
bluenoise.cpp
6868
printinfo.cpp
69+
oiio_gpu.cpp
6970
../libtexture/texturesys.cpp
7071
../libtexture/texture3d.cpp
7172
../libtexture/environment.cpp
@@ -175,6 +176,8 @@ if (MINGW)
175176
target_link_libraries (OpenImageIO PRIVATE ws2_32)
176177
endif()
177178

179+
oiio_cuda_target (OpenImageIO)
180+
178181
file (GLOB iba_sources "imagebufalgo_*.cpp")
179182
if (MSVC)
180183
# In some MSVC setups, the IBA functions with huge template expansions

src/libOpenImageIO/imageio.cpp

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -386,6 +386,12 @@ attribute(string_view name, TypeDesc type, const void* val)
386386
default_thread_pool()->resize(ot - 1);
387387
return true;
388388
}
389+
if (Strutil::starts_with(name, "gpu:")
390+
|| Strutil::starts_with(name, "cuda:")) {
391+
return pvt::gpu_attribute(name, type, val);
392+
}
393+
394+
// Things below here need to buarded by the attrib_mutex
389395
spin_lock lock(attrib_mutex);
390396
if (name == "read_chunk" && type == TypeInt) {
391397
oiio_read_chunk = *(const int*)val;
@@ -485,6 +491,12 @@ getattribute(string_view name, TypeDesc type, void* val)
485491
*(ustring*)val = ustring(OIIO_VERSION_STRING);
486492
return true;
487493
}
494+
if (Strutil::starts_with(name, "gpu:")
495+
|| Strutil::starts_with(name, "cuda:")) {
496+
return pvt::gpu_getattribute(name, type, val);
497+
}
498+
499+
// Things below here need to buarded by the attrib_mutex
488500
spin_lock lock(attrib_mutex);
489501
if (name == "read_chunk" && type == TypeInt) {
490502
*(int*)val = oiio_read_chunk;

0 commit comments

Comments
 (0)