Optimize for RISC-V Vector Extension #136

camel-cdr · 2025-10-07T21:54:03Z

This PR adds a custom odiff implementation using the RISC-V vector extension (RVV).

Since Zig doesn't expose SIMD intrinsics and the Zig Vector API is very limited (and can't express essential features of RVV), I implemented the code in C using RVV intrinsics.
Assembly would've also been possible, but since Zig ships its own C compiler, I thought C would be more maintainable.

Comparing two random images, I measured a 5.9x speedup for the diff computation on the BananaPI BPI-F3 (ignoring file IO and image decoding).

The tests all pass, even with a lower error rate than the current implementation:

Click to expand

$ zig build -Dcpu=generic_rv64+rva20u64+v -Ddynamic=true test-all # RISC-V with RVV
test-all
└─ test-integration
   └─ run test stderr
=== Color Delta Comparison Test ===
Format: Original(float) | SIMD(float) | Diff | Error%
-----------------------------------------------------------
slight RGB difference    :   0.505300 |   0.505300 | 0.000000 |  0.000%
black vs white           : 32857.133157 | 32857.132813 | 0.000345 |  0.000%
red vs green             : 16214.709176 | 16214.708984 | 0.000192 |  0.000%
blue vs yellow           : 35214.748955 | 35214.750000 | 0.001045 |  0.000%
semi-transparent slight difference:   0.127318 |   0.127318 | 0.000000 |  0.000%
transparent vs opaque black: 32857.133157 | 32857.132813 | 0.000345 |  0.000%
gray slight difference   :   2.021200 |   2.021200 | 0.000000 |  0.000%
pinkish slight difference:   1.784546 |   1.784546 | 0.000000 |  0.000%
minimal blue difference  :   0.160096 |   0.160096 | 0.000000 |  0.000%
dark red vs dark green   : 4085.533182 | 4085.533203 | 0.000021 |  0.000%
-----------------------------------------------------------
Max Error: 0.000%  |  Average Error: 0.000%

Specific test - Gray pixels:
Original: 0.505300010106008
SIMD: 0.5053000450134277
Difference: 0.000000034907419732554956
identical transparent: orig=0.000000, fixed=0.000000, diff=0.000000
identical opaque black: orig=0.000000, fixed=0.000000, diff=0.000000
identical white: orig=0.000000, fixed=0.000000, diff=0.000000
very small difference: orig=0.493481, fixed=0.493481, diff=0.000000
very small difference near white: orig=0.493481, fixed=0.493481, diff=0.000000

$ zig build -Dcpu=generic_rv64+rva20u64 -Ddynamic=true test-all # RISC-V without RVV
test-all
└─ test-integration
   └─ run test stderr
=== Color Delta Comparison Test ===
Format: Original(float) | SIMD(float) | Diff | Error%
-----------------------------------------------------------
slight RGB difference    :   0.505300 |   0.504395 | 0.000905 |  0.179%
black vs white           : 32857.133157 | 32813.811768 | 43.321390 |  0.132%
red vs green             : 16214.709176 | 16200.840332 | 13.868844 |  0.086%
blue vs yellow           : 35214.748955 | 35182.234131 | 32.514825 |  0.092%
semi-transparent slight difference:   0.127318 |   0.126953 | 0.000365 |  0.286%
transparent vs opaque black: 32857.133157 | 32813.811768 | 43.321390 |  0.132%
gray slight difference   :   2.021200 |   2.018311 | 0.002889 |  0.143%
pinkish slight difference:   1.784546 |   1.782715 | 0.001831 |  0.103%
minimal blue difference  :   0.160096 |   0.159912 | 0.000184 |  0.115%
dark red vs dark green   : 4085.533182 | 4082.038574 | 3.494608 |  0.086%
-----------------------------------------------------------
Max Error: 0.286%  |  Average Error: 0.135%

Specific test - Gray pixels:
Original: 0.505300010106008
SIMD: 0.50439453125
Difference: 0.0009054788560080018
identical transparent: orig=0.000000, fixed=0.000000, diff=0.000000
identical opaque black: orig=0.000000, fixed=0.000000, diff=0.000000
identical white: orig=0.000000, fixed=0.000000, diff=0.000000
very small difference: orig=0.493481, fixed=0.492676, diff=0.000805
very small difference near white: orig=0.493481, fixed=0.492676, diff=0.000805

The RVV backend is currently only enabled when the target ISA supports the "V" extension. I don't think it's worth adding runtime detection, because the expectation is that distros will move the baseline to RVA23, which supports RVV by default.

Some of the test logs needed to be modified to only log the fp64 result, such that current SIMD and my implementation can share the test code.

I wasn't able to simply add new riscv64 entries to build_targets, because currently zig fails do build libjpeg-turbo statically.
Not sure how to best add it to CI, but for development I used the following setup:

$ # on local system
$ apt install qemu-system-riscv64 qemu-user gcc-riscv64-linux-gnu-base podman
$ export QEMU_LD_PREFIX="/usr/riscv64-linux-gnu"
$ export QEMU_CPU="rva23u64"
$ podman run --platform=linux/riscv64 -it ubuntu:latest bash
$ # in container
$ apt update -y && apt upgrade -y && apt install -y libspng-dev libjpeg-dev libtiff-dev pkg-config wget tar xz-utils git
$ wget 'https://ziglang.org/download/0.15.1/zig-riscv64-linux-0.15.1.tar.xz'
$ tar xf zig-riscv64-linux-0.15.1.tar.xz && mv zig-riscv64-linux-0.15.1 zig
$ git clone --depth=1 https://github.com/camel-cdr/odiff/ && cd odiff
$ /zig/zig build -Dcpu=generic_rv64+rva20u64+v -Ddynamic=true test-all # test with RVV
$ /zig/zig build -Dcpu=generic_rv64+rva20u64 -Ddynamic=true test-all # test without RVV

Copilot

Pull Request Overview

This PR adds a custom odiff implementation using the RISC-V Vector Extension (RVV) for enhanced performance. The implementation provides a 5.9x speedup for diff computation on RISC-V systems with vector extensions while maintaining high accuracy.

Implements RVV-specific optimizations in C using vector intrinsics
Adds conditional compilation to use RVV when the target supports the "V" extension
Modifies test infrastructure to support both RVV and existing SIMD implementations

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
src/rvv.c	New C implementation using RISC-V vector intrinsics for pixel difference calculations
src/diff.zig	Adds RVV comparison function and conditional logic to use RVV when available
src/test_color_delta.zig	Updates test infrastructure to work with both RVV and existing SIMD implementations
build.zig	Adds RVV C source file to compilation and fixes test linking issues

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-07T22:28:13Z

src/rvv.c

+		if (writeDiff) {
+			__riscv_vse32(m, diff, __riscv_vmv_v_x_u32m4(diffcol, vl), vl);
+		}


The diff pointer is incremented in the loop but may be null when writeDiff is false. This could lead to undefined behavior when accessing diff += vl in the loop increment when diff is null.

Yes, it's technically UB to increment a pointer past the object it's pointing, but compilers don't really exploit that, only if you deference it. Usually in RVV code incrementing pointers looks cleaner than using indices, so I thought I'd increment diff as well, but considering this is now hopefully going into the library I'll change it to use indices.

Copilot · 2025-10-07T22:28:13Z

src/diff.zig

+pub noinline fn compareRVV(base: *const Image, comp: *const Image, diff_output: *?Image, diff_count: *u32, diff_lines: ?*DiffLines, ignore_regions: ?[]struct { u32, u32 }, max_delta: f64, options: DiffOptions) !void {
+    _ = ignore_regions;


The ignore_regions parameter is unused but accepted. Consider removing it from the function signature or implementing the functionality to avoid confusion.

Suggested change

pub noinline fn compareRVV(base: *const Image, comp: *const Image, diff_output: *?Image, diff_count: *u32, diff_lines: ?*DiffLines, ignore_regions: ?[]struct { u32, u32 }, max_delta: f64, options: DiffOptions) !void {

_ = ignore_regions;

pub noinline fn compareRVV(base: *const Image, comp: *const Image, diff_output: *?Image, diff_count: *u32, diff_lines: ?*DiffLines, max_delta: f64, options: DiffOptions) !void {

I left that unused argument there on purpose for future expansion.
Ideally you'd also handle the ignore_regions with SIMD code, but I wasn't sure how to best do it/how ignore_regions is commonly used.

src/diff.zig

build.zig

src/diff.zig

shreyassanthu77 · 2025-10-09T17:47:54Z

btw @dmtrKovalenko before merging this I should first make libjpeg build for riscv64 so probably a good idea to merge #137, fix static linking libjpeg for riscv and then merge this one with everything working

oh and @camel-cdr dw about CI I will add it once I fix the build :)

shreyassanthu77 · 2025-10-10T16:20:53Z

heyy @camel-cdr can you rebase your changes on top of #137 and see if you can build and run on riscv statically please

EDIT: you can do zig build test-all -Dtarget=riscv64-linux -fqemu to run the tests directly in qemu btw

camel-cdr · 2025-10-10T20:48:50Z

Alright, this seems to work.
I've also added riscv64 to the CI tests, which required adding an additional check for builtin.cpu.arch, and two new new CI exports, one for the basic riscv64 target, and one for riscv64 with the feature flag RVA23.

dmtrKovalenko · 2025-10-13T00:04:33Z

I tried to fix your PR but I can not rebase the branch

dmtrKovalenko · 2025-10-13T00:05:01Z

The code looks and works correctly, I've run it on my rp linux machine so it should be good to merge

dmtrKovalenko · 2025-10-13T00:08:32Z

src/rvv.c

+#include <stdint.h>
+#include <stddef.h>
+
+#if !__riscv_vector


can we do this more neatly? I would prefer this to be exposed by the

Suggested change

#if !__riscv_vector

#ifndef __riscv_vector

#error "Vector intrinsics require the vector extension."

#endif

dmtrKovalenko · 2025-10-13T00:09:12Z

src/rvv.c

+#if !__riscv_vector
+
+/* unused stubs */
+uint32_t odiffRVV(uint32_t *src1, uint32_t *src2, size_t n, float max_delta, uint32_t *diff, uint32_t diffcol) { return 0; }


I'm wondering about this being unused symbol. Should we define a separate build var for test only targets and include this only for tests?

I wasn't sure how to best do an conditional extern function in zig, so I decided to always make it available but only use it for rvv.

dmtrKovalenko · 2025-10-13T00:09:40Z

src/rvv.c

+#define YIQ_I_WEIGHT 0.299
+#define YIQ_Q_WEIGHT 0.1957
+
+#if 0


why do you need this?

This part doesn't do anything, but document how I arrived at the different computation, which is equivilant to the more complex one used by the zig code.

dmtrKovalenko · 2025-10-13T00:11:11Z

src/rvv.c

+#include <riscv_vector.h>
+
+static inline vfloat32m4_t
+rvv_yiq_diff(vuint8m1x4_t v4, vuint8m1x4_t y4, size_t vl)


does it make sense to use __riscv_v_min_vlen it is not a blocker for sure just wondering what vector size was on the CPU you were testing this patch?

I tested for 128, 256, 512 and 1024 in qemu and 256 in hardware.
The code should work for any vector length.
The standard "V" extension reauires a minimum VLEN of 128, I thought __riscv_vector tests for the stadard "V", but checking the spec again, it's also defined for some of the embeded subsets.
So I meant to use __riscv_v, which actually checks for "V", but __riscv_vector && __riscv_v_elen_fp >= 32 should also work and potentially cover some embedded implementations.

dmtrKovalenko · 2025-10-13T00:13:40Z

src/rvv.c

+	v = __riscv_vfmul(__riscv_vfmul(db, Y6, vl), db, vl);
+	v = __riscv_vfmacc(v, dg, __riscv_vfmacc(__riscv_vfmul(dg, Y4, vl), Y5, db, vl), vl);
+	v = __riscv_vfmacc(v, dr, __riscv_vfmacc(__riscv_vfmacc(__riscv_vfmul(dr, Y1, vl), Y2, dg, vl), Y3, db, vl), vl);


have you considered doing the int math instead? I am not sure but I have a feeling that fpus on riscv machine are usually much slower than standard arithmetic units

floating-point is usually fast on RISC-V hardware: https://camel-cdr.github.io/rvv-bench-results/bpi_f3/index.html

I don't think fixed-point is worth it these days, unless it allows you to reduce the element width.

I presume that implementations that can't pay the price for fast fp, will only be in embedded and use a RVV subset without fp support, so not standard "V", but e.g. zve32x

One thing I did try is using fp16, which almost worked, but failed the accuracy tests by a few percent.

dmtrKovalenko · 2025-10-14T00:29:36Z

closed in favor of the PR where I added a little bit of CI test runners and fixed a few minor things.

Thank you for the contribution

P.S. Your contribution commit is left untouched

camel-cdr changed the title ~~add RVV support~~ Optimize for RISC-V Vector Extension Oct 7, 2025

dmtrKovalenko requested a review from Copilot October 7, 2025 22:27

Copilot AI reviewed Oct 7, 2025

View reviewed changes

camel-cdr force-pushed the main branch from 9cd6b62 to e5259a2 Compare October 8, 2025 16:52

feat: merge all the image deps into one package

d03d99a

shreyassanthu77 reviewed Oct 9, 2025

View reviewed changes

build.zig Outdated Show resolved Hide resolved

shreyassanthu77 reviewed Oct 9, 2025

View reviewed changes

src/diff.zig Outdated Show resolved Hide resolved

camel-cdr force-pushed the main branch from e5259a2 to 80f4926 Compare October 9, 2025 18:08

feat: update imgz to fix riscv builds

e2a4f81

shreyassanthu77 added 3 commits October 10, 2025 22:17

feat: update imgz to fix x86 builds

22fb811

chore: make spell checker shut up

994fda1

update imgz

57e9d56

camel-cdr force-pushed the main branch 11 times, most recently from 3daafcc to 277cc90 Compare October 10, 2025 20:45

add RVV support

eb12ad7

camel-cdr force-pushed the main branch from 277cc90 to eb12ad7 Compare October 10, 2025 20:48

dmtrKovalenko reviewed Oct 13, 2025

View reviewed changes

dmtrKovalenko approved these changes Oct 13, 2025

View reviewed changes

dmtrKovalenko closed this Oct 14, 2025

		pub noinline fn compareRVV(base: const Image, comp: const Image, diff_output: ?Image, diff_count: u32, diff_lines: ?*DiffLines, ignore_regions: ?[]struct { u32, u32 }, max_delta: f64, options: DiffOptions) !void {
		_ = ignore_regions;

-#if !__riscv_vector
+#ifndef __riscv_vector
+#error "Vector intrinsics require the vector extension."
+#endif

Optimize for RISC-V Vector Extension #136

Optimize for RISC-V Vector Extension #136

Uh oh!

Conversation

camel-cdr commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shreyassanthu77 commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shreyassanthu77 commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

camel-cdr commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmtrKovalenko commented Oct 13, 2025

Uh oh!

dmtrKovalenko commented Oct 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmtrKovalenko commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

camel-cdr commented Oct 7, 2025 •

edited

Loading

shreyassanthu77 commented Oct 9, 2025 •

edited

Loading

shreyassanthu77 commented Oct 10, 2025 •

edited

Loading

camel-cdr commented Oct 10, 2025 •

edited

Loading

dmtrKovalenko commented Oct 14, 2025 •

edited

Loading