Refactor AVX2 blitters using macros #2055

Starbuck5 · 2023-03-26T05:57:03Z

This moves code repetition out of the blitter functions themselves into the macros developed for running the new AVX2 alpha blitters. Also adjusted a couple set_ instructions (like _mm256_set_epi16) to use set1_ variants instead so the value can just be provided once.

Makes simd_blitters_avx2 800 lines shorter.

I believe this has a small performance regression, which could be solved by rewiring the RUN_AVX2_BLITTER implementation strategy. Maybe we've got to put the DUFF loops back in there?

itzpr3d4t0r · 2024-01-02T18:44:08Z

I don't know if you want to move this forward or what but i have done a lot of testing and have some findings to share.
Basically the LOOP_UNROLLED isn't improving the situation much, but what's making it perform sligtly worse is the maskload/maskstore intructions compared to going pixel by pixel with something like this:

LOOP_UNROLLED4(
      {
          mm_src = _mm_cvtsi32_si128(*srcp);
          mm_dst = _mm_cvtsi32_si128(*dstp);

          {BLIT_CODE}

          *dstp = _mm_cvtsi128_si32(mm_dst);

          srcp += srcpxskip;
          dstp += dstpxskip;
      },
      n, pre_8_width);

If you take a look at the latency/CPI of these instructions and sum them up you can see that it's faster if we're talking like 1-3/4 excess pixels but it's slower for 4-7 pixels, hence the maskload is better in that case.
I've managed to come up with another strategy that does two pixels at a time with _mm_loadl_epi64 so that we shave off some cycles and does a final pixel if there with _mm_cvtsi32_si128 and this was better all the time. Downside to that is that we'd have to pass in an AVX and an SSE blit code all the time.

itzpr3d4t0r · 2024-01-02T18:45:47Z

Anyway this implementation could still use some simplifications like we don't actually need two shuffle masks for the 16 bit shuffle macro since we can directly use _mm256_unpacklo_epi8 / _mm256_unpackhi_epi8 with a zeroed register and get the same result. Which is what i did in #2642.

itzpr3d4t0r · 2024-01-02T18:47:24Z

But I'd really encourage you to revive this PR if possible or maybe consider working on this together? Your choice.

Starbuck5 · 2024-01-03T04:50:40Z

Hey itz, I would like to resolve this as well.

All my efforts to find a way out for performance failed earlier this year, but I think a sub-5% performance regression is acceptable for blend modes. I wouldn't want to do that to the PREMUL blend, but for the others I think it would be acceptable. They're all so fast anyway. A cleaner and more understandable codebase has its own benefits.

I'd like to avoid interleaving SSE2 and AVX code, again for the simplicity of the codebase.

Starbuck5 · 2024-11-02T09:27:49Z

I've let this PR linger too long, I think it should be closed. It can be revisited in the future if necessary.

Starbuck5 requested a review from a team as a code owner March 26, 2023 05:57

Starbuck5 marked this pull request as draft March 26, 2023 05:57

Starbuck5 force-pushed the avx2-blitter-refactors branch from cb79e09 to 33a213f Compare April 18, 2023 05:56

yunline added Code quality/robustness Code quality and resilience to changes SIMD labels Jun 4, 2023

Refactor AVX2 blitters using macros

ae116c2

Starbuck5 force-pushed the avx2-blitter-refactors branch from 33a213f to ae116c2 Compare December 10, 2023 21:05

itzpr3d4t0r mentioned this pull request Jan 4, 2024

Partially refactor SSE2 blitters with macros #2656

Merged

Starbuck5 closed this Nov 2, 2024

Starbuck5 deleted the avx2-blitter-refactors branch November 2, 2024 09:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Refactor AVX2 blitters using macros #2055

Refactor AVX2 blitters using macros #2055

Uh oh!

Starbuck5 commented Mar 26, 2023

Uh oh!

itzpr3d4t0r commented Jan 2, 2024 •

edited

Loading

Uh oh!

itzpr3d4t0r commented Jan 2, 2024

Uh oh!

itzpr3d4t0r commented Jan 2, 2024

Uh oh!

Starbuck5 commented Jan 3, 2024

Uh oh!

Starbuck5 commented Nov 2, 2024

Uh oh!

Uh oh!

Uh oh!

Refactor AVX2 blitters using macros #2055

Refactor AVX2 blitters using macros #2055

Uh oh!

Conversation

Starbuck5 commented Mar 26, 2023

Uh oh!

itzpr3d4t0r commented Jan 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

itzpr3d4t0r commented Jan 2, 2024

Uh oh!

itzpr3d4t0r commented Jan 2, 2024

Uh oh!

Starbuck5 commented Jan 3, 2024

Uh oh!

Starbuck5 commented Nov 2, 2024

Uh oh!

Uh oh!

itzpr3d4t0r commented Jan 2, 2024 •

edited

Loading