Yuv color gamut conversion and transformYuv420 using Arm Neon. #124

cmacdonald-arm · 2024-04-29T16:06:04Z

These 2 patches add a Arm Neon version for the YUV color gamut conversion as well as the transformYuv420 function.

DichenZhang1 · 2024-04-29T21:03:24Z

lib/src/aarch64/gainmapmath_neon.cpp

+      w += 8;
+    } while (w < image->width / 2);
+    y0_ptr += image->luma_stride * 2;
+    y1_ptr += image->luma_stride * 2;


I think this may go out of boundary if the image height is odd. Could you double check?

Hi Dichen,

This function reads the same data as the base implementation, if there is an odd number of rows the final row will not be processed, as is the case with the base implementation.

The standard implementation also reads two rows of Y data here:
https://github.com/google/libultrahdr/blob/main/lib/src/gainmapmath.cpp#L513

Also if there is an odd number of rows the final row will not be processed due to integer truncation here:
https://github.com/google/libultrahdr/blob/main/lib/src/gainmapmath.cpp#L508

This change conflicts with https://github.com/google/libultrahdr/pull/126/commits
@cmacdonald-arm, few observations,

folder structure is slightly different. I am using lib/src/dsp/. Is lib/src/dsp/arm/ better ? (will be similar to libvpx)

enabled aosp build (Android.bp as well)

option to disable all intrinsics via macro at configure level

support all resolutions, widths may not align to 16 always, example 1080x1920.
pls let me know if you have any suggestions.

Hi Ram,

Personally I have no issues with the creation of a dsp/arm directory. I named the directory AArch64 to signify that all code is only 64-bit compatible. Also if some redesign is taking place to help with the integration of Neon implementations this could help greatly.

With your final point, images with widths that are not multiples to 16 should be fine as along as the buffer containing the images is padded? Currently the ALIGNM macro is used to allow a multiple of 16 scan-lines to be passed to the jpeg encoder, is this likely to change?

ALIGNM is used in API-0: https://github.com/google/libultrahdr/blob/main/lib/src/jpegr.cpp#L225
and API-1: https://github.com/google/libultrahdr/blob/main/lib/src/jpegr.cpp#L225

aligned to 16 is an acceptable assumption. This is not likely to change. So the intrinsics should be fine. I have updated the folder structure to lib/src/dsp/arm/*.cpp

Hi Ram,

Would the prefered approach here be, first merge your PR, with the updates to the build system and file structure/names. Second rebase these changes and rename the aarch64 Neon files?

As pull request #126 contains build changes for different platforms (android and host), may be, it can be merged first. Then this change can be rebased on top of main.
We have introduced a macro UHDR_ENABLE_INTRINSICS to enable/disable intrinsics if required. I think that also needs to be incorporated in your change. Currently the arch64 is the only gating option.
There is gainmapmath_neon.h, can this be merged with gainmapmath.h just like editor_helper.h. This is to avoid more header files. If not, when x86 intrinsics are introduced there will be more header files.

@DichenZhang1 & @ram-mohan, is there any more changes you would like made to this PR?

The change looks good to me. Thank you.

Add color gamut conversion function using Arm Neon and associated tests. This implementation is only enabled/compatible with AArch64 systems. An important difference between the base C implementation and the Neon version is that the Neon version is "generic". There is a single Neon function that requires the conversion coefficents to be passed as an input vector to the function, this is due to how the function is called in a loop. This change reduces the number of times the coefficents need to be loaded. As of this commit the function is only used as part of the unit tests and will be used in subsequent patches. Change-Id: I0380c31db4ecbb40d7a19375865b2e18ced64b56

Add AArch64 Neon version of transformYuv420 and functional test. This version uses the fixed-point Arm Neon version of the color gamut conversion function, as such the results compared to the base floating-point implementation can contain an off by one error. Change-Id: I9446acdd85223d7072fd972f80b4945321ffc6a4

jwright-arm · 2024-05-29T10:45:58Z

@ram-mohan, @DichenZhang1 Do we need to do anything else here before this can be merged?

ram-mohan · 2024-05-30T05:28:58Z

@jwright-arm The changes look good to me. I will check perf numbers with and with out the changes once. Nothing is required from you side. Thank you.

ram-mohan · 2024-05-30T20:48:52Z

@cmacdonald-arm @jwright-arm I have profiled from my side. The numbers look really good. Thank you for this change. Couple of observations, with neon enabled the change is not bit exact. Is this expected. Further i am seeing an improvement of 16x for convertYuvNeon method. This seems to be on the larger side because neon module is processing 8 pixels per iteration in comparison with non-vector implementation which is doing 1 per iteration. I am unable to explain the additional 2x gain.

cmacdonald-arm · 2024-05-31T07:29:58Z

@ram-mohan This change is expected to not to be bit exact, it contains a possible off by one errors. This is due to the usage of fixed point rather than floating point coefficients. In terms of the uplift, 16 Y values and 8 U & V values are processed per loop iteration, also with the usage of fixed point, conversions from integer to floating point and vice versa are no longer needed.

ram-mohan · 2024-06-04T13:30:47Z

Thank you, @DichenZhang1 This can be merged as well.

DichenZhang1 · 2024-06-04T21:55:08Z

Thank you! @cmacdonald-arm @ram-mohan

DichenZhang1 reviewed Apr 29, 2024

View reviewed changes

cmacdonald-arm added 2 commits May 8, 2024 17:55

cmacdonald-arm force-pushed the color-gamut-neon branch from 079101d to d900dc0 Compare May 14, 2024 07:27

DichenZhang1 approved these changes Jun 4, 2024

View reviewed changes

DichenZhang1 merged commit c7e843c into google:main Jun 4, 2024
11 checks passed

lhpqaq mentioned this pull request Dec 2, 2024

Yuv color gamut conversion and transformYuv420 using Risc-V Vector #334

Open

Yuv color gamut conversion and transformYuv420 using Arm Neon. #124

Yuv color gamut conversion and transformYuv420 using Arm Neon. #124

Uh oh!

Conversation

cmacdonald-arm commented Apr 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DichenZhang1 Apr 29, 2024

Choose a reason for hiding this comment

Uh oh!

cmacdonald-arm Apr 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ram-mohan Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

cmacdonald-arm May 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ram-mohan May 2, 2024

Choose a reason for hiding this comment

Uh oh!

cmacdonald-arm May 3, 2024

Choose a reason for hiding this comment

Uh oh!

ram-mohan May 5, 2024

Choose a reason for hiding this comment

Uh oh!

cmacdonald-arm May 17, 2024

Choose a reason for hiding this comment

Uh oh!

ram-mohan May 19, 2024

Choose a reason for hiding this comment

Uh oh!

jwright-arm commented May 29, 2024

Uh oh!

ram-mohan commented May 30, 2024

Uh oh!

ram-mohan commented May 30, 2024

Uh oh!

cmacdonald-arm commented May 31, 2024

Uh oh!

ram-mohan commented Jun 4, 2024

Uh oh!

DichenZhang1 commented Jun 4, 2024

Uh oh!

Uh oh!

Uh oh!

cmacdonald-arm commented Apr 29, 2024 •

edited

Loading

cmacdonald-arm Apr 30, 2024 •

edited

Loading

cmacdonald-arm May 1, 2024 •

edited

Loading