feat(iba): IBA::perpixel_op (#4299)

lgritz · web-flow · commit 979e5f99f7c0 · 2024-07-03T08:24:42.000-07:00
Inspired by a question by Vlad Erium, I have added a simpler way for C++
users of OIIO to construct IBA-like functions for simple unary and
binary operations on ImageBufs where each pixel is independent and based
only on the corresponding pixel of the input(s).

The user only needs to supply the contents of the inner loop, i.e. just
doing one pixel's work, and only needs to work for float values. All
format conversion, sizing and allocation of the destination buffer,
looping over pixels, and multithreading is automatic.

If the actual buffers in question are not float-based, conversions will
happen automatically, at about a 2x slowdown compared to everything
being in float all along, which seems reasonable for the extreme
simplicity, especially for use cases where the buffers are fairly likely
to be float anyway.

What you pass is a function or lambda that takes spans for the output
and input pixel values. Here's an example that adds two images channel
by channel, producing a sum image:

    // Assume ImageBuf A, B are the inputs, ImageBuf R is the output
    R = ImageBufAlgo::perpixel_op(A, B,
            [](span&lt;float&gt; r, cspan&lt;float&gt; a, cspan&lt;float&gt; b) {
                for (size_t c = 0, nc = size_t(r.size()); c &lt; nc; ++c)
                    r[c] = a[c] + b[c];
                return true;
            });

This is exactly equivalent to calling

    R = ImageBufAlgo::add(A, B);

and for float IB's, it's just as fast.

To make the not-float case fast and not require the DISPATCH macro
magic, I needed to change the ImageBuf::Iterator just a bit to add
store() and load() method templates to the iterators, and add a field
that holds the buffer type. That might make a slight ABI tweak, so I am
thinking that I will make this for the upcoming OIIO 3.0, and not
backport to the release branch.

I think this is ready to introduce at this time, but I'm also studying
whether more varieties of this approach are needed, whether the
non-float case can be sped up even more, and whether some of the
existing IBA functions should switch to using this internally (good
candidates would be those that are almost always performed on float
buffers, but for which the heavy template expansion of the DISPATCH
approach to handling the full type zoo currently makes them very bloated
and expensive to compile, for very little real-world gain).

We should probably consider this to be experimental for a little while,
just in case the function signature for this changes as I think about it
more or add functionality.

---------

Signed-off-by: Larry Gritz &lt;lg@larrygritz.com&gt;
diff --git a/src/doc/imagebuf.rst b/src/doc/imagebuf.rst
@@ -204,7 +204,7 @@ Deep data in an ImageBuf
 Error Handling
 ==============
 
-.. doxygenfunction:: OIIO::ImageBuf::errorf
+.. doxygenfunction:: OIIO::ImageBuf::errorfmt
 .. doxygenfunction:: OIIO::ImageBuf::has_error
 .. doxygenfunction:: OIIO::ImageBuf::geterror
 
@@ -239,6 +239,82 @@ Miscellaneous
 
 
 
+Writing your own image processing functions
+===========================================
+
+In this section, we will discuss how to write functions that operate
+pixel by pixel on an ImageBuf. There are several different approaches
+to this, with different trade-offs in terms of speed, flexibility, and
+simplicity of implementation.
+
+Simple pixel-by-pixel access with `ImageBufAlgo::perpixel_op()`
+---------------------------------------------------------------
+
+Pros:
+
+* You only need to supply the inner loop body, the part that does the work
+  for a single pixel.
+* You can assume that all pixel data are float values.
+
+Cons/Limitations:
+
+* The operation must be one where each output pixel depends only on the
+  corresponding pixel of the input images.
+* Currently, the operation must be unary (one input image to produce one
+  output image), or binary (two input images, one output image). At this time,
+  there are not options to operate on a single image in-place, or to have more
+  than two input images, but this may be extended in the future.
+* Operating on `float`-based images is "full speed," but if the input images
+  are not `float`, the automatic conversions will add some expense. In
+  practice, we find working on non-float images to be about half the speed of
+  float images, but this may be acceptable in exchange for the simplicity of
+  this approach, especially for operations where you expect inputs to be float
+  typically.
+
+.. doxygenfunction:: perpixel_op(const ImageBuf &src, bool (*op)(span<float>, cspan<float>), int prepflags = ImageBufAlgo::IBAprep_DEFAULT, int nthreads = 0)
+
+.. doxygenfunction:: perpixel_op(const ImageBuf &srcA, const ImageBuf &srcB, bool (*op)(span<float>, cspan<float>, cspan<float>), int prepflags = ImageBufAlgo::IBAprep_DEFAULT, int nthreads = 0)
+
+Examples:
+
+.. code-block:: cpp
+
+    // Assume ImageBuf A, B are the inputs, ImageBuf R is the output
+
+    /////////////////////////////////////////////////////////////////
+    // Approach 1: using a standalone function to add two images
+    bool my_add (span<float> r, cspan<float> a, cspan<float> b) {
+        for (size_t c = 0, nc = size_t(r.size()); c < nc; ++c)
+            r[c] = a[c] + b[c];
+        return true;
+    }
+
+    R = ImageBufAlgo::perpixel_op(A, B, my_add);
+
+    /////////////////////////////////////////////////////////////////
+    // Approach 2: using a "functor" class to add two images
+    struct Adder {
+        bool operator() (span<float> r, cspan<float> a, cspan<float> b) {
+            for (size_t c = 0, nc = size_t(r.size()); c < nc; ++c)
+                r[c] = a[c] + b[c];
+            return true;
+        }
+    };
+
+    Adder adder;
+    R = ImageBufAlgo::perpixel_op(A, B, adder);
+    
+    /////////////////////////////////////////////////////////////////
+    // Approach 3: using a lambda to add two images
+    R = ImageBufAlgo::perpixel_op(A, B,
+            [](span<float> r, cspan<float> a, cspan<float> b) {
+                for (size_t c = 0, nc = size_t(r.size()); c < nc; ++c)
+                    r[c] = a[c] + b[c];
+                return true;
+            });
+
+
+
 Iterators -- the fast way of accessing individual pixels
 ========================================================
 
diff --git a/src/doc/imageioapi.rst b/src/doc/imageioapi.rst
@@ -286,6 +286,8 @@ just exist in the OIIO namespace as general utilities. (See
 
 .. doxygenfunction:: get_extension_map
 
+|
+
  .. _sec-startupshutdown:
 
 Startup and Shutdown
diff --git a/src/include/OpenImageIO/imagebuf.h b/src/include/OpenImageIO/imagebuf.h
@@ -1316,6 +1316,16 @@ class OIIO_API ImageBuf {
         // Clear the error flag
         void clear_error() { m_readerror = false; }
 
+        // Store into `span<T> dest` the channel values of the pixel the
+        // iterator points to.
+        template<typename T = float> void store(span<T> dest) const
+        {
+            OIIO_DASSERT(dest.size() >= oiio_span_size_type(m_nchannels));
+            convert_pixel_values(TypeDesc::BASETYPE(m_pixeltype), m_proxydata,
+                                 TypeDescFromC<T>::value(), dest.data(),
+                                 m_nchannels);
+        }
+
     protected:
         friend class ImageBuf;
         friend class ImageBufImpl;
@@ -1338,6 +1348,7 @@ class OIIO_API ImageBuf {
         char* m_proxydata = nullptr;
         WrapMode m_wrap   = WrapBlack;
         bool m_readerror  = false;
+        unsigned char m_pixeltype;
 
         // Helper called by ctrs -- set up some locally cached values
         // that are copied or derived from the ImageBuf.
@@ -1500,6 +1511,17 @@ class OIIO_API ImageBuf {
 
         void* rawptr() const { return m_proxydata; }
 
+        // Load values from `span<T> src` into the pixel the iterator refers
+        // to, doing any conversions necessary.
+        template<typename T = float> void load(cspan<T> src)
+        {
+            OIIO_DASSERT(src.size() >= oiio_span_size_type(m_nchannels));
+            ensure_writable();
+            convert_pixel_values(TypeDescFromC<T>::value(), src.data(),
+                                 TypeDesc::BASETYPE(m_pixeltype), m_proxydata,
+                                 m_nchannels);
+        }
+
         /// Set the number of deep data samples at this pixel. (Only use
         /// this if deep_alloc() has not yet been called on the buffer.)
         void set_deep_samples(int n)
diff --git a/src/include/OpenImageIO/imagebufalgo_util.h b/src/include/OpenImageIO/imagebufalgo_util.h
@@ -90,6 +90,102 @@ parallel_image(ROI roi, std::function<void(ROI)> f)
 
 
 
+/// Common preparation for IBA functions (or work-alikes): Given an ROI (which
+/// may or may not be the default ROI::All()), destination image (which may or
+/// may not yet be allocated), and optional input images (presented as a span
+/// of pointers to ImageBufs), adjust `roi` if necessary and allocate pixels
+/// for `dst` if necessary.  If `dst` is already initialized, it will keep its
+/// "full" (aka display) window, otherwise its full/display window will be set
+/// to the union of inputs' full/display windows.  If `dst` is uninitialized
+/// and `force_spec` is not nullptr, use `*force_spec` as `dst`'s new spec
+/// rather than using the first input image.  Also, if any inputs are
+/// specified but not initialized or are broken, it's an error, so return
+/// false. If all is ok, return true.
+///
+/// The `options` list contains optional ParamValue's that control the
+/// behavior, including what input configurations are considered errors, and
+/// policies for how an uninitialized output is constructed from knowledge of
+/// the input images.  The following options are recognized:
+///
+///   - "require_alpha" : int (default: 0)
+///
+///     If nonzero, require all inputs and output to have an alpha channel.
+///
+///   - "require_z" : int (default: 0)
+///
+///     If nonzero, require all inputs and output to have a z channel.
+///
+///   - "require_same_nchannels" : int (default: 0)
+///
+///     If nonzero, require all inputs and output to have the same number of
+///     channels.
+///
+///   - "copy_roi_full" : int (default: 1)
+///
+///     Copy the src's roi_full. This is the default behavior. Set to 0 to
+///     disable copying roi_full from src to dst.
+///
+///   - "support_volume" : int (default: 1)
+///
+///     Support volumetric (3D) images. This is the default behavior. Set to 0
+///     to disable support for 3D images.
+///
+///   - "copy_metadata" : string (default: "true")
+///
+///     If set to "true-like" value, copy most "safe" metadata from the first
+///     input image to the destination image. If set to "all", copy all
+///     metadata from the first input image to the destination image, even
+///     dubious things. If set to a "false-like" value, do not copy any
+///     metadata from the input images to the destination image.
+///
+///   - "clamp_mutual_nchannels" : int (default: 0)
+///
+///     If nonzero, clamp roi.chend to the minimum number of channels of any
+///     of the input images.
+///
+///   - "support_deep" : string (default: "false")
+///
+///     If "false-like" (the default), deep images (having multiple depth
+///     values per pixel) are not supported. If set to a true-like value
+///     (e.g., "1", "on", "true", "yes"), deep images are allowed, but not
+///     required, and if any input or output image is deep, they all must be
+///     deep. If set to "mixed", any mixture of deep and non-deep images may
+///     be supplied. If set to "required", all input and output images must be
+///     deep.
+///
+///   - "dst_float_pixels" : int (default: 0)
+///
+///     If nonzero and dst is uninitialized, then initialize it to float
+///     regardless of the pixel types of the input images.
+///
+///   - "minimize_nchannels" : int (default: 0)
+///
+///     If nonzero and dst is uninitialized and the multiple input images do
+///     not all have the same number of channels, initialize `dst` to have the
+///     smallest number of channels of any input. (If 0, the default, an
+///     uninitialized `dst` will be given the maximum of the number of
+///     channels of all input images.)
+///
+///   - "require_matching_channels" : int (default: 0)
+///
+///     If nonzero, require all input images to have the same channel *names*,
+///     in the same order.
+///
+///   - "merge_metadata" : int (default: 0)
+///
+///     If nonzero, merge all inputs' metadata into the `dst` image's
+///     metadata.
+///
+///   - "fill_zero_alloc" : int (default: 0)
+///
+///     If nonzero and `dst` is uninitialized, fill `dst` with 0 values if we
+///     allocate space for it.
+///
+bool
+IBAprep(ROI& roi, ImageBuf& dst, cspan<const ImageBuf*> srcs = {},
+        KWArgs options = {}, ImageSpec* force_spec = nullptr);
+
+
 /// Common preparation for IBA functions: Given an ROI (which may or may not
 /// be the default ROI::All()), destination image (which may or may not yet
 /// be allocated), and optional input images, adjust roi if necessary and
@@ -506,6 +602,67 @@ inline TypeDesc type_merge (TypeDesc a, TypeDesc b, TypeDesc c)
     IBA_FIX_PERCHAN_LEN (av, len, 0.0f, av.size() ? av.back() : 0.0f);
 
 
+
+/// Simple image per-pixel unary operation: Given a source image `src`, return
+/// an image of the same dimensions (and same data type, unless `options`
+/// includes the "dst_float_pixels" hint turned on, which will result in a
+/// float pixel result image) where each pixel is the result of running the
+/// caller-supplied function `op` on the corresponding pixel values of `src`.
+/// The `op` function should take two `span<float>` arguments, the first
+/// referencing a destination pixel, and the second being a reference to the
+/// corresponding source pixel. The `op` function should return `true` if the
+/// operation was successful, or `false` if there was an error.
+///
+/// The `perpixel_op` function is thread-safe and will parallelize the
+/// operation across multiple threads if `nthreads` is not equal to 1
+/// (following the usual ImageBufAlgo `nthreads` rules), and also takes care
+/// of all the pixel loops and conversions to and from `float` values.
+///
+/// The `options` keyword/value list contains additional controls. It supports
+/// all hints described by `IBAPrep()` as well as the following:
+///
+///   - "nthreads" : int (default: 0)
+///
+///     Controls the number of threads (0 signalling to use all available
+///     threads in the pool.
+///
+/// An example (using the binary op version) of how to implement a simple
+/// pixel-by-pixel `add()` operation that is the equivalent of
+/// `ImageBufAlgo::add()`:
+///
+/// ```
+///    // Assume ImageBuf A, B are the inputs, ImageBuf R is the output
+///    R = ImageBufAlgo::perpixel_op(A, B,
+///            [](span<float> r, cspan<float> a, cspan<float> b) {
+///                for (size_t c = 0, nc = size_t(r.size()); c < nc; ++c)
+///                    r[c] = a[c] + b[c];
+///                return true;
+///            });
+/// ```
+///
+/// Caveats:
+/// * The operation must be one that can be applied independently to each
+///   pixel.
+/// * If the input image is not `float`-valued pixels, there may be some
+///   inefficiency due to the need to convert the pixels to `float` and back,
+///   since there is no type templating and thus no opportunity to supply a
+///   version of the operation that allows specialization to any other pixel
+///   data types
+//
+OIIO_NODISCARD OIIO_API
+ImageBuf
+perpixel_op(const ImageBuf& src, bool(*op)(span<float>, cspan<float>),
+            KWArgs options = {});
+
+/// A version of perpixel_op that performs a binary operation, taking two
+/// source images and a 3-argument `op` function that receives a destination
+/// and two source pixels.
+OIIO_NODISCARD OIIO_API
+ImageBuf
+perpixel_op(const ImageBuf& srcA, const ImageBuf& srcB,
+            bool(*op)(span<float>, cspan<float>, cspan<float>),
+            KWArgs options = {});
+
 }  // end namespace ImageBufAlgo
 
 // clang-format on
diff --git a/src/libOpenImageIO/CMakeLists.txt b/src/libOpenImageIO/CMakeLists.txt
@@ -254,7 +254,9 @@ if (OIIO_BUILD_TESTS AND BUILD_TESTING)
     add_test (unit_imagecache ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/imagecache_test)
 
     fancy_add_executable (NAME imagebufalgo_test SRC imagebufalgo_test.cpp
-                          LINK_LIBRARIES OpenImageIO ${OpenCV_LIBRARIES}
+                          LINK_LIBRARIES OpenImageIO
+                                         ${OpenCV_LIBRARIES}
+                                         ${OPENIMAGEIO_IMATH_TARGETS}
                           FOLDER "Unit Tests" NO_INSTALL)
     add_test (unit_imagebufalgo ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/imagebufalgo_test)
 
diff --git a/src/libOpenImageIO/imagebuf.cpp b/src/libOpenImageIO/imagebuf.cpp
@@ -3163,6 +3163,7 @@ ImageBuf::IteratorBase::init_ib(WrapMode wrap, bool write)
     m_y            = 1 << 31;
     m_z            = 1 << 31;
     m_wrap         = (wrap == WrapDefault ? WrapBlack : wrap);
+    m_pixeltype    = spec.format.basetype;
 }
 
 
diff --git a/src/libOpenImageIO/imagebufalgo.cpp b/src/libOpenImageIO/imagebufalgo.cpp
diff --git a/src/libOpenImageIO/imagebufalgo_test.cpp b/src/libOpenImageIO/imagebufalgo_test.cpp
diff --git a/src/libutil/benchmark.cpp b/src/libutil/benchmark.cpp

Original file line number	Diff line number	Diff line change
`@@ -3163,6 +3163,7 @@ ImageBuf::IteratorBase::init_ib(WrapMode wrap, bool write)`
`3163`	`3163`	`m_y = 1 << 31;`
`3164`	`3164`	`m_z = 1 << 31;`
`3165`	`3165`	`m_wrap = (wrap == WrapDefault ? WrapBlack : wrap);`
	`3166`	`+ m_pixeltype = spec.format.basetype;`
`3166`	`3167`	`}`
`3167`	`3168`
`3168`	`3169`