Skip to content

Conversation

@serpent7776
Copy link
Contributor

Introduce diff algorithm in pure AVX512. A new build flag avx512_diff is introduced that controls whether the new algorithm should be used.

A new test file test_avx.zig is added, because no existing tests were hitting the AVX diff case (default options, without output image). These tests are simply copies of other tests with changed options.

Note that the new algorithm operates on floats, not ints.

Performance

Testing this on my local a few-years-old laptop. This machine seem to have 128 bits SSE units, but it claims AVX512 support.

The results here are not very optimistic. On this machine, the new algorithm as actually slower than normal zig version. The actual results will very likely vary greatly from CPU to CPU, depending on the AVX support.

Some of this difference might be due to operating on floats rather than ints.

$ hyperfine -N -i './zig-out/bin/odiff-avx /tmp/funocaml.png /tmp/funocaml2.png' './zig-out/bin/odiff-noavx /tmp/funocaml.png /tmp/funocaml2.png'                                 
Benchmark 1: ./zig-out/bin/odiff-avx /tmp/funocaml.png /tmp/funocaml2.png
  Time (mean ± σ):     310.3 ms ±  11.9 ms    [User: 255.9 ms, System: 49.8 ms]
  Range (min … max):   298.6 ms … 334.7 ms    10 runs
 
  Warning: Ignoring non-zero exit code.
 
Benchmark 2: ./zig-out/bin/odiff-noavx /tmp/funocaml.png /tmp/funocaml2.png
  Time (mean ± σ):     251.8 ms ±   9.2 ms    [User: 205.4 ms, System: 42.7 ms]
  Range (min … max):   240.5 ms … 270.5 ms    11 runs
 
  Warning: Ignoring non-zero exit code.
 
Summary
  ./zig-out/bin/odiff-noavx /tmp/funocaml.png /tmp/funocaml2.png ran
    1.23 ± 0.07 times faster than ./zig-out/bin/odiff-avx /tmp/funocaml.png /tmp/funocaml2.png

@serpent7776
Copy link
Contributor Author

Not sure if it's my commit that broke windows build, but it reproduces for me locally and is fixed by 8adfb60.

   +- compile exe odiff ReleaseFast x86_64-windows-gnu 1 errors
error: lld-link: undefined symbol: __emutls_get_address
    note: referenced by libturbojpeg.a(jsimd.c.obj):(.text)
    note: referenced by libturbojpeg.a(jsimd.c.obj):(.text)
    note: referenced by libturbojpeg.a(jsimd.c.obj):(jsimd_can_rgb_ycc)
    note: referenced 41 more times

@serpent7776
Copy link
Contributor Author

It didn't help on the CI, same error 🤷

@dmtrKovalenko
Copy link
Owner

Yes CI is clogged by unrelated reasons. I'll need to fix it first

@shreyassanthu77
Copy link
Collaborator

i am like 90% certain that this is just an issue with using x86_64-windows-gnu (mingw) for compilation while the vcpkg lib is compiled with msvc

rebasing this on to #126 should work without any issues (my build zig is very different than the current head tho so i expect some nasty conflicts (sorry 😅)

@serpent7776
Copy link
Contributor Author

I did a rebase and also added avx enabled CI run.

@shreyassanthu77
Copy link
Collaborator

niceee!! btw can we do some sort of runtime detection here? like the primary way odiff will be used is from npm so most people will not build from source.

adding a runtime flag that toggles this would probably be a better idea? @dmtrKovalenko what do you think??

@dmtrKovalenko
Copy link
Owner

Yeah I think it probably should be runtime enabled or once I test just become a default. Something like --enable-asm flag

@serpent7776
Copy link
Contributor Author

I changed the flag to be runtime flag.

@shreyassanthu77
Copy link
Collaborator

I approve of this pr looks based

Copy link
Owner

@dmtrKovalenko dmtrKovalenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, tested it out on the real avx512 machine today seems like everything is working just fine and gives the actual perf benefit.

Committed a little change to not log the unsupported OS warning because it is obsolete.

@dmtrKovalenko dmtrKovalenko merged commit 42348a5 into dmtrKovalenko:main Oct 12, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants