Skip to content

Conversation

@anthony-linaro
Copy link
Contributor

@anthony-linaro anthony-linaro commented Nov 7, 2024

This adds proper support for building for Windows ARM64 devices. using MSVC.

It does actually work right now, but only by complete accident, and with no SIMD enabled - this enables everything properly.

As part of this, I updated the version of sse2neon used to one compatible with MSVC - the particular hash I used is the exact one that Blender uses, which is known working (there were some issues with the base 1.7.0 release).

Commands used:

mkdir build
cd build
cmake -G"Ninja" ..
cmake --build .
ctest

All tests pass. This is using VS2022, I wouldn't really use VS2019 for ARM64 platforms if I'm honest.

I didn't test any of the GPU things, although GLUT may work, as I added ARM64 support to FreeGLUT a few years ago.

Addresses #1859 (so probably of interest to @num3ric also)

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 7, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@num3ric
Copy link
Contributor

num3ric commented Nov 7, 2024

Love it, thanks for the work! Hopefully the CLA & failing tests can be resolved.

FetchContent_Declare(sse2neon
GIT_REPOSITORY https://github.com/DLTcollab/sse2neon.git
GIT_TAG v1.6.0
GIT_TAG 227cc413fb2d50b2a10073087be96b59d5364aea
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you couldn't pick an official version number here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long story short, we uncovered some issues with 1.7.0 in Blender on various platforms, and needed to update to a non-versioned commit, as they only version sse2neon once a year - the issues may or may not affect OCIO, but this is at least a known working and tested version that Blender uses.

@anthony-linaro
Copy link
Contributor Author

I'll have a dig into those tests and try and figure it out - I don't have those platforms to hand, so will be some guesswork involved.

For the CLA, I'm trying to find out who our internal person is that is responsible for EasyCLA so thay can approve it under our corporate account - I'll find out at some point soon I'm sure

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>
Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>
@anthony-linaro
Copy link
Contributor Author

Have fixed a small buildfile issue - I am hoping this may resolve the mac issues (sse2neon accidentally got disabled in some configurations).

I have also fixed some additional issues I encountered while debugging (namely it wasn't actually using sse2neon properly outside of the cmake test, oops).

I have been building with python off, there seems to be some sort of Access Violation Exception, but I'm not sure what it is caused by - for now, the rest of the test suite passes so it should be fine.

@anthony-linaro
Copy link
Contributor Author

CLA now done!

@num3ric
Copy link
Contributor

num3ric commented Nov 15, 2024

Have you tried (cross-)compiling -A ARM64 from an x64-based PC? Unfortunately #1859 isn't fully addressed here, since compiling this PR still fails for me unless I add:

    if(MSVC)
        set(CMAKE_SYSTEM_PROCESSOR ${MSVC_CXX_ARCHITECTURE_ID})
    endif()

If not, I get OCIO_ARCH_X86 1 and CpuInfo.cpp fails per issue.

PR looks good otherwise though!

@anthony-linaro
Copy link
Contributor Author

I had not, mainly on the account of me not having a suitable x64 machine to hand 😄

I'll get one set up for development and give it a go

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>
@anthony-linaro
Copy link
Contributor Author

anthony-linaro commented Nov 20, 2024

Commit pushed, should allow cross-compilation. Tested with the command line:

cmake -A arm64 .. -DOCIO_BUILD_PYTHON=OFF

On a Windows x64 machine - the binaries produced appear to run correctly on my ARM64 machine

@anthony-linaro
Copy link
Contributor Author

Thanks for the approval, @num3ric - is there someone else who needs to review this for it to go in?

@doug-walker
Copy link
Collaborator

@anthony-linaro , thanks for the contribution. Regarding reviews, the project requires a minimum of two reviewers.

This PR makes significant changes to the build system which are not fully tested by our CI, so we definitely need more people to test. This includes validating that the various Mac builds were not affected.

Are you looking for this to go in OCIO 2.4.x or could it wait until 2.5 next fall?

@anthony-linaro
Copy link
Contributor Author

anthony-linaro commented Nov 25, 2024

Thanks for the reply @doug-walker - ideally sooner rather than later, so 2.4.x - this would mean Windows ARM64 would be properly compatible with the VFX Reference Platform 2025, rather than having to wait another year (well, once my USD PR is merged, too).

VFX reference platform compatibility is something that has been requested by a few external partners, and it would probably be a disappointing answer for me to give them, if I told them they need to wait another year, or maintain an out-of-tree patch

@num3ric
Copy link
Contributor

num3ric commented Nov 26, 2024

+1 on prioritization, to get rid of patches on our next engine release.

Also confirming on my end that this PR successfully resolves #1859.

@lazka lazka mentioned this pull request Dec 8, 2024
Copy link
Collaborator

@cozdas cozdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for extending the Windows support for Arm64.

When I compile this branch with Visual Studio (as opposed to ninja generator), I get compiler errors in CPUinfo.cpp for _xgetbv and __cpuid x86 intrinsics which are guarded only with _MSC_VER macro thus active for ARM targets as well. Can you please make sure that this compiles with Windows native generator as well.

@doug-walker
Copy link
Collaborator

@anthony-linaro , @num3ric , as a reminder, our 2.4.1 release is going out on Wednesday. I know you indicated you wanted this PR included, but unfortunately we don't feel that it is ready to merge.

@cozdas
Copy link
Collaborator

cozdas commented Dec 10, 2024

BTW @anthony-linaro I tested the branch with Windows 11 running in a parallels VM on macbook pro M2 and I'm not super experienced with Arm based Windows development. Let me know if you think that my test environment is not ideal or if you think that I'm overlooking something.

@Mushroom
Copy link

Mushroom commented Dec 10, 2024

<snip, sorry, wrong account>

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>
@anthony-linaro
Copy link
Contributor Author

anthony-linaro commented Dec 10, 2024

Doug - That's disappointing to hear, with a more expeditious review, I could have worked through the issues, but understandably circumstances can prevent this. Will there be another 2.4.x release?

Cozdas - I can't repro this, which version of VS are you using? I used these build instructions to use the VS generator:

cmake .. -DOCIO_BUILD_PYTHON=OFF
cmake --build . --config Release

I also tried without the second line, opening the produiced sln file in VS2022, choosing "Release", and building like that, still with no issues.

@num3ric
Copy link
Contributor

num3ric commented Dec 10, 2024

@cozdas I did have the same compilation errors in visual studio before 04a723a. Are you sure you're on the latest version of this PR?

@cozdas
Copy link
Collaborator

cozdas commented Dec 10, 2024

Since it worked for you, I re-tried after updating the toolchain, visual studio and cmake to the latest version and re-tried and it compiled on my end too. Basic core tests work too.

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>
Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>
@anthony-linaro
Copy link
Contributor Author

Do we know who else needs to review this for it to get merged?

Copy link
Collaborator

@doug-walker doug-walker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work on this. I think the best approach at this point is to merge it into the main branch, which will get more people testing it.

If no problems are encountered over the next few months, we will include it in OCIO 2.4.2.

@doug-walker doug-walker merged commit c09951e into AcademySoftwareFoundation:main Dec 19, 2024
24 checks passed
doug-walker added a commit to autodesk-forks/OpenColorIO that referenced this pull request Mar 15, 2025
* Add support for Windows ARM64

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>

* Fix improper compiler flag check

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>

* Fix sse2neon issues on Windows ARM64

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>

* Fix cross-compilation on Windows for X64 -> ARM64

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>

* Fix comment to match with corresponding if directive

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>

* Check for MSVC before setting MSVC-style flag

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>

* Fix comment to resolve ambiguity

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>

---------

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>
Co-authored-by: Doug Walker <doug.walker@autodesk.com>
(cherry picked from commit c09951e)
Signed-off-by: Doug Walker <doug.walker@autodesk.com>
doug-walker added a commit that referenced this pull request Mar 19, 2025
* Add support for Windows ARM64 (#2089)

* Add support for Windows ARM64

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>

* Fix improper compiler flag check

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>

* Fix sse2neon issues on Windows ARM64

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>

* Fix cross-compilation on Windows for X64 -> ARM64

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>

* Fix comment to match with corresponding if directive

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>

* Check for MSVC before setting MSVC-style flag

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>

* Fix comment to resolve ambiguity

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>

---------

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>
Co-authored-by: Doug Walker <doug.walker@autodesk.com>
(cherry picked from commit c09951e)
Signed-off-by: Doug Walker <doug.walker@autodesk.com>

* Fix issue with ocio_depts handling spaces in file paths (#2109)

Signed-off-by: Taegyun Ha <taegyun.ha@disguise.one>
(cherry picked from commit c5c85b0)
Signed-off-by: Doug Walker <doug.walker@autodesk.com>

* Issue #2116 : Fixes Metal backend's generated shaders with float/int constant Array Performance (#2117)

* Issue #2116 : Improves Metal Backend Perf. moves the constant float/int declaration to constant space so it doesnt get initialized per thread. This improved color correction performance on M4 Max 3-4 times better.

Signed-off-by: Morteza <smostajabodaveh@apple.com>

* Tiny refactoring to improve code maintainability

Signed-off-by: Morteza <smostajabodaveh@apple.com>

---------

Signed-off-by: Morteza <smostajabodaveh@apple.com>
(cherry picked from commit d807b38)
Signed-off-by: Doug Walker <doug.walker@autodesk.com>

* Adsk Contrib - Issue #2111 Absolute paths not working through proxy (#2112)

* Ticket #2111
- Do not use config proxy for absolute paths while computing file hash or loading LUT data.
- Added the unit test provided in the ticket.

Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com>

* - Changing the logic so that for abs paths we first try the configProxy and if that fails fall back to file system. For relative paths, we don't fall back to file system though, proxy is expected to handle those.
- Removed the unnecessary closeLutStream() function. We're using unique pointers, that means RAII is in place. The whole idea behind RAII is we don't need to worry about the cleanup or the type of the object wrapped by the RAII handler (unique_ptr in this case).
- Cleaned up some unnecessary conversions, type shuffling and copies around the code I touched.
- Cleaned up some unsafe type casts which are prone to dereferencing null pointers.

Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com>

* - Ah! make_unique is a c++14 feature and we support C++11. I wonder why windows build is configured to use c++14+ while other platforms use C++11. Replacing make_unique with the new syntax to make the other platforms happy too.

Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com>

* - Minor cleanup
- Added a test for absolute path to inexistent file.

Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com>

---------

Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com>
Co-authored-by: Doug Walker <doug.walker@autodesk.com>
(cherry picked from commit af69f39)
Signed-off-by: Doug Walker <doug.walker@autodesk.com>

* Change recommended Imath version to 3.1.12. This should fix Issue #1764. (#2120)

Signed-off-by: Mark Titchener <mark.titchener@foundry.com>
Co-authored-by: Doug Walker <doug.walker@autodesk.com>
(cherry picked from commit 7237eaa)
Signed-off-by: Doug Walker <doug.walker@autodesk.com>

* Integrating matrix multiplication fix from OSL (#2121)

See AcademySoftwareFoundation/OpenShadingLanguage#1513 for more details.

Signed-off-by: Jerry Gamache <jerry.gamache@autodesk.com>
Co-authored-by: Doug Walker <doug.walker@autodesk.com>
(cherry picked from commit fed973f)
Signed-off-by: Doug Walker <doug.walker@autodesk.com>

* Add missing setConfigIOProxy call to the Python API (#2128)

* Add missing setConfigIOProxy call to the Python API

Signed-off-by: Rémi Achard <remiachard@gmail.com>

* Restore a clean cache for other unit tests

Signed-off-by: Rémi Achard <remiachard@gmail.com>

---------

Signed-off-by: Rémi Achard <remiachard@gmail.com>
Co-authored-by: Doug Walker <doug.walker@autodesk.com>
(cherry picked from commit 30db204)
Signed-off-by: Doug Walker <doug.walker@autodesk.com>

* ACES 2.0 Output Transform performance optimisation (#2127)

* ACES 2.0 Output Transform performance optimisation (#2119)

* Extend ocioperf to take config file parameter on CLI

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Extend ocioconvert to take config on command line

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Extract tonescale_fwd function

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Extract inverse tonescale function

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Combine c and Z variables in J calculation exponent
replace 100.0 entries when referring to the scale of J
Extract calculation of nonlinear compression into functions

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Split RGB<->JMh function into two parts to expose opponent intermediate values

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Use function to compute matrix multiply for LMS calculations

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Remove unused member variable from JMhParams structure

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Combine chromatic adaptation weights into LMS matrix (and inverse) - CHANGES PIXEL OUTPUT

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Use matrix form for transforming cone responses to Aab

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Normalise the F_L parameter

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Remove ra and ba related variables to avoid them being out of sync with opponent calculation

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Make A<->J conversion function generic

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Deduplicate Y<->J conversions

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Factor JMh scaling parameters into Aab matrices

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* factor our references to PI, 360 and 180 constants
Avoid looking up cusp twice during inverse
Whilst searching for the cusp we have already constrained the search so we do not need to clamp

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Add functions to explain some of the calculations

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Further clarify when 100 means reference luminance
Migrate rescaling into tonescale s_2 parameter
Rename model_gamma to reflect it is actually the inverse

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* migrate init steps performed within other init functions to the top level to avoid repeat init of precomputed values.

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* extract some of the fixed values that only depend on the hue to reduce recomputation during inverse gamut mapping

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Avoid double lookup for reachMaxM value by resolving once the hue is known.

also reduces size of object on stack by not passing the whole table.

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Push wrapping of hues to the boundary,
mark up  conversion points from external inputs etc

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Store gamma values as reciprocals
move more magic constants into const variables
factor some of the complex expressions into function (temporarily makes things slower)

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Add some missing includes to headers

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* minor cleanup to use std::array instead of plain array for test samples

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Inline reach boundary finding
restructure find_gamut_boundary_intersection to highlight common patterns.

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Extract gamut mapper compression function
rework get_focus_gain to directly computer the slope_gain
Share calculation of analytical thereshold

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Rework gamut mapper to compress absolute M then only recalculate calculate J

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Precalculate maximum search range for cusp lookup

next steps would be to factor hue into separate table to improve cache hits followed by redistribution to more uniform hues which should narrow search range

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Experiment with reusing slope calculations in gamut mapper
presmooth cusp values

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Add a collection of TODO's

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Restore function mapping table index to hue

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Minor tweaks to tonescale inverse clamp

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Remove duplicate table whilst calculating upper hull gamma

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Add some additional sample points for the upper hull gamma finder

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Slight tidy up of gamma fitting code

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Experiment with alternate smin implementation

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Remove unused function and tidy up comments

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Extract hue search into separate function

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Extract hues into separate table, merge gamma values into their place (gamma values now sampled on cusp hue intervals). Removes extra texture from GPU path.

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Simplify upper hull gamma hue lookup to avoid unneeded lerping as we are sampling the table entries directly

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Split out tonescale function, minor tweaks to Aab->JMh

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Build tables more uniformly, needs some clean up and lots of testing

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Speed up reach corner finding by switching to testing against the Achromatic rather than J limit

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Speed up hull gamma finding by computing values which depend only on the test points and not the gamma values themselves

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Adjust GPU hue lookup to take advantage of more uniform distribution

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Fix GLSL compatibility with hue lookup
Remove compiler warnings for unused parameters

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Attempt to simplify table generation code

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Explicilty allow GCC to perform additional optimisations - Needs some discussion

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Add extra entries to reach table to avoid needing to clamp to range during pixel processing

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* GPU move reach Max M sampling to avoid looking it up multiple times per pixel

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Remove smoothing from GPU path, it is baked into the csup

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Fix bug with reach lookup

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Try only wrap hues on input to the shaders

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* rework GPU camut compressor to follow the same algorithm as CPU. Not 100% the same GPU still recalculates some values

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Rework solve_J_intersect to have fewer div instructions

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Adjust GPU code to better align with CPU code's structure, some additional precomputation is now applied during shader generation

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Precompute more scaling factors into matrices and nonlinear functions

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Experiment with unsigned integers for array access

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Bypass one J-> A conversion by saving the Aab computed earlier

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Test intrinsics for compression Norm calculation

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Attempt to calculate sin/cos only once per pixel.
Some minor micro optimisations.
Further alignment of GPU with CPU code,
Tests values need evaluating
Some GPU results are different - TBD

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Remove unused parameters

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Try tree vectoriser for gcc

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Add Vectorise option for MSVC

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Remove unused function

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Constexpr std::max is only available in C++ 14 for now avoid the call to it

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Try to fir intrinsic based errors on osome build configurations

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Another C++ 14 usage fix

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Remove check for CLANG left over from testing

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

---------

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>

* Update ACES2 CPU non-SIMD path (#2122)

* - Commenting out the ACES2 SIMD implementation for now to focus on validity of the scalar math. For SIMD we need to do implement run-time switching logic too.
- Slight improvements to the unit tests so that we print out the computed error metric as well as the actual and expected values. Helps to see the magnitude of the error.
- FixedFunctionOpCPU and BuiltinTransform tests now produce error lines with the same structure & syntax, including the computed error.
- Updated the expected values for ACES2 tests with the values the new optimized code produces, this makes all of the of CPU tests pass now.
- For ACES2 ops and builtin transforms, the error threshold is increased to 1e-4
- added few, temporary code snippets that dumps the currently produced results, making it easier to update the golden values if needed again.

Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com>

* - Fixing Linux build

Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com>

* Making Linux build happy is never easy.

Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com>

---------

Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com>

* Address GPU unit test failures (#2123)

* - Weights for cos(3h) and sin(h) in chroma_compress_norm() looks wrong. Fixing the weights makes the GPU tests pass now (except of the inverse output transform which seems to have a separate issue).
- If the new weights are correct, I'll need to update the CPU test target values too.

Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com>

* - Updating the expected values in the CPU tests

Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com>

* - The remaining GPU test failures were caused by a simple typo where we were passing h instead of J to ocio_tonescale_inv() function. With the fix all the unit tests are happy now.

- Since we decided not to include any SIMD implementation in this version, I removed the conditional code paths and left the current SSE & AVX implementations as commented out for future guidence.

Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com>

---------

Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com>

* Remove unused code for old gamut table calculations (#2124)

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>
Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com>

* Minor code cleanup

Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com>

* Adding negative A trap on Aab_to_JMh_Shader() per code review

Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com>

* Adding copysign to tonescale to make it aligned with the CPU implementation.

It's possible that on GPU we may never receive negative J due to prior guarding, but for now aligning with the CPU to be on the safer side.

Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com>

* Add built-in transform round-trip test

Signed-off-by: Doug Walker <doug.walker@autodesk.com>

* Loosen tolerance for other machines

Signed-off-by: Doug Walker <doug.walker@autodesk.com>

* Add GPU round-trip tests

Signed-off-by: Doug Walker <doug.walker@autodesk.com>

* Loosen tolerances for other GPUs

Signed-off-by: Doug Walker <doug.walker@autodesk.com>

---------

Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>
Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com>
Signed-off-by: Doug Walker <doug.walker@autodesk.com>
Co-authored-by: Kevin Wheatley <kevin.wheatley@framestore.com>
Co-authored-by: Doug Walker <doug.walker@autodesk.com>
(cherry picked from commit 1931542)
Signed-off-by: Doug Walker <doug.walker@autodesk.com>

* Increment library version to 2.4.2

Signed-off-by: Doug Walker <doug.walker@autodesk.com>

* Propose NaN fix for the ACES2 inverse output transforms (#2132)

* Propose Aab_to_RGB NaN fix

Signed-off-by: Doug Walker <doug.walker@autodesk.com>

* Fix for test on ARM

Signed-off-by: Doug Walker <doug.walker@autodesk.com>

* Fix for tests on Linux/Windows

Signed-off-by: Doug Walker <doug.walker@autodesk.com>

* Fix for GPU test on Linux

Signed-off-by: Doug Walker <doug.walker@autodesk.com>

* NaN fix for gamma and double log fixed functions

Signed-off-by: Doug Walker <doug.walker@autodesk.com>

* Remove commented-out code

Signed-off-by: Doug Walker <doug.walker@autodesk.com>

---------

Signed-off-by: Doug Walker <doug.walker@autodesk.com>
(cherry picked from commit 0546612)
Signed-off-by: Doug Walker <doug.walker@autodesk.com>

---------

Signed-off-by: Anthony Roberts <anthony.roberts@linaro.org>
Signed-off-by: Doug Walker <doug.walker@autodesk.com>
Signed-off-by: Taegyun Ha <taegyun.ha@disguise.one>
Signed-off-by: Morteza <smostajabodaveh@apple.com>
Signed-off-by: cuneyt.ozdas <cuneyt.ozdas@autodesk.com>
Signed-off-by: Mark Titchener <mark.titchener@foundry.com>
Signed-off-by: Jerry Gamache <jerry.gamache@autodesk.com>
Signed-off-by: Rémi Achard <remiachard@gmail.com>
Signed-off-by: Kevin Wheatley <kevin.wheatley@framestore.com>
Co-authored-by: Anthony Roberts <anthony.roberts@linaro.org>
Co-authored-by: Taegyun Ha <110908525+DevTGHa@users.noreply.github.com>
Co-authored-by: Morteza Mostajab <92918486+Morteeza@users.noreply.github.com>
Co-authored-by: Cuneyt Ozdas <cuneyt.ozdas@autodesk.com>
Co-authored-by: Mark Titchener <mark.titchener@foundry.com>
Co-authored-by: JGamache-autodesk <56274617+JGamache-autodesk@users.noreply.github.com>
Co-authored-by: Rémi Achard <remiachard@gmail.com>
Co-authored-by: Kevin Wheatley <kevin.wheatley@framestore.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants