-
Notifications
You must be signed in to change notification settings - Fork 653
Add Highway SIMD acceleration to ImageBufAlgo [add, sub, mul, div, mad, resample] #4994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Optional SIMD optimizations for selected ImageBufAlgo operations using the Google Highway library: • add/sub • mul/div • mad • resample Adds CMake and build system support, new implementation helpers, and developer documentation. Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
|
I suspect you used LLM for some of this? Which is fine, but I think you should document in the PR description (commit comment) which tool you used and for what parts. |
|
|
||
| # Google Highway SIMD acceleration for selected ImageBufAlgo ops. This is an | ||
| # optional optimization dependency: if enabled but not found, it will be | ||
| # compiled out. | ||
| # | ||
| # Back-compat: honor -DUSE_HWY=OFF by mapping it to OIIO_USE_HWY if the latter | ||
| # was not explicitly provided. | ||
| if (DEFINED USE_HWY AND NOT DEFINED OIIO_USE_HWY) | ||
| set (OIIO_USE_HWY ${USE_HWY} CACHE BOOL | ||
| "Enable Google Highway SIMD optimizations (if Highway is available)" FORCE) | ||
| else () | ||
| option (OIIO_USE_HWY "Enable Google Highway SIMD optimizations (if Highway is available)" ON) | ||
| endif () | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unnecessarily complex. Maybe just replace this whole thing with
| # Google Highway SIMD acceleration for selected ImageBufAlgo ops. This is an | |
| # optional optimization dependency: if enabled but not found, it will be | |
| # compiled out. | |
| # | |
| # Back-compat: honor -DUSE_HWY=OFF by mapping it to OIIO_USE_HWY if the latter | |
| # was not explicitly provided. | |
| if (DEFINED USE_HWY AND NOT DEFINED OIIO_USE_HWY) | |
| set (OIIO_USE_HWY ${USE_HWY} CACHE BOOL | |
| "Enable Google Highway SIMD optimizations (if Highway is available)" FORCE) | |
| else () | |
| option (OIIO_USE_HWY "Enable Google Highway SIMD optimizations (if Highway is available)" ON) | |
| endif () | |
| option (OIIO_USE_HWY "Enable experimental Google Highway SIMD optimizations (if Highway is available)" OFF) |
Also note that I changed the default to OFF, and described it as experimental, which I think is better at this point, in case there is an unforeseen issue.
|
|
||
|
|
||
| # Google Highway for SIMD (optional optimization) | ||
| checked_find_package (hwy ENABLE ${OIIO_USE_HWY}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curiously, the checked_find_package accepts an ENABLE parameter but doesn't use it! (That's my fault, I guess.) I made another suggestion about the top level enabling variable, so I would recommend changing this to
| checked_find_package (hwy ENABLE ${OIIO_USE_HWY}) | |
| if (OIIO_USE_HWY) | |
| checked_find_package (hwy) | |
| endif () |
| #endif | ||
|
|
||
| #ifdef _MSC_VER | ||
| # include <malloc.h> // for alloca |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make sense to me. We already use alloca extensively without trouble, why did this need to be added now?
| #elif defined(_MSC_VER) | ||
| # define OIIO_ALLOCA(type, size) (assert(size < (1<<20)), (size) != 0 ? ((type*)_alloca((size) * sizeof(type))) : nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? I think this definition is identical to the #else clause below, so the MS case would already be handled properly, right?
| #if defined(_WIN32) | ||
| # include <malloc.h> // for alloca | ||
| #endif | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? It always worked before
|
|
||
|
|
||
|
|
||
| #if defined(OIIO_USE_HWY) && OIIO_USE_HWY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the cmake side, you see to always define OIIO_USE_HWY. So these can just be
| #if defined(OIIO_USE_HWY) && OIIO_USE_HWY | |
| #if OIIO_USE_HWY |
Or else, if you want cmake to only add the definition at all if enabled, then #ifdef is what should be in the cpp.
| template<class Rtype, class Atype, class Btype> | ||
| static bool | ||
| add_impl_hwy(ImageBuf& R, const ImageBuf& A, const ImageBuf& B, ROI roi, | ||
| int nthreads) | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done a line-by-line comparison, but it seems to me that the only difference between add_impl_hwy, sub_impl_hwy, and mul_impl_hwy is likely going to be
[](auto d, auto a, auto b) { return hn::Add(a, b); }
versus that one lambda changing for Sub and Mul.
I would love for even the initial commit to reduce this whole thing to a shared hwy_binary_perpixel_op() template that takes the lambda housing the op kernel as a templated parameter.
| // Process pixel by pixel (scalar fallback for strided channels) | ||
| for (int x = roi.xbegin; x < roi.xend; ++x) { | ||
| Rtype* r_ptr = ChannelPtr<Rtype>(Rv, x, y, roi.chbegin); | ||
| const Atype* a_ptr = ChannelPtr<Atype>(Av, x, y, | ||
| roi.chbegin); | ||
| const Btype* b_ptr = ChannelPtr<Btype>(Bv, x, y, | ||
| roi.chbegin); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should benchmark the strided case and see how it compares to the contiguous case and the full scalar fallback that we've always had. If there is no big speed gain, I would be in favor of eliminating this whole clause and let non-contiguous strides use the old scalar path, then there is much less template expansion for hwy in the cases where there is not a large gain to be had. Note that this means that the "to hwy or not to hwy" test would need to test contiguity in addition to just localpixels().
Optional SIMD optimizations for selected ImageBufAlgo operations using the Google Highway library: • add/sub
• mul/div
• mad
• resample
Adds CMake and build system support, new implementation helpers, and developer documentation.
Checklist:
behavior.
testsuite.
PR, by pushing the changes to my fork and seeing that the automated CI
passed there. (Exceptions: If most tests pass and you can't figure out why
the remaining ones fail, it's ok to submit the PR and ask for help. Or if
any failures seem entirely unrelated to your change; sometimes things break
on the GitHub runners.)
fixed any problems reported by the clang-format CI test.
corresponding Python bindings. If altering ImageBufAlgo functions, I also
exposed the new functionality as oiiotool options.