vvc_deblock.c: fix RANDCLIP#281
vvc_deblock.c: fix RANDCLIP#281stone-d-chen wants to merge 1 commit intoffvvc:deblock_asm_20250223from
Conversation
Previously RANDCLIP(x, diff) was computing the difference x - diff and then clipping it between (0, max_pixel_val + rnd() % 2 * diff). This means we're not really generating a random value in the range. Instead compute (x - diff) + rnd() % 2 * diff. This returns a value such that abs(value - x) < diff. This greatly improves the generation of strong deblocking data.
| } while (0) | ||
| #define RANDCLIP(x, diff) av_clip(GET(x) - (diff), 0, \ | ||
| (1 << (bit_depth)) - 1) + rnd() % FFMAX(2 * (diff), 1) | ||
| #define RANDCLIP(x, diff) av_clip(GET(x) - (diff) + rnd() % FFMAX(2 * (diff), 1), \ |
There was a problem hiding this comment.
Hi Stone,
Do we need to update HEVC as well?
If so, could you submit the HEVC patch for upstream review first?
There was a problem hiding this comment.
Yep sounds good!
|
@nuomi2021 been looking into further improvements to the luma generation, it seems fairly non-trivial. One of the main issues is occasionally Where The current code does actually try to compensate for it, since (d0 << 1) < beta_2 == d0 < (beta_2 >> 1) which is beta_3. It becomes difficult to solve both constraints while also satisfying (d0 + d1 < beta). I attempted to put it into a computer algebra solver (wxMaxima) but it's quite messy. |
|
If it's difficult, perhaps we should approach it as it is.
Which one do you prefer? |
AVX2 sounds good, we need to modify the C side to expose multiple blocks right? I'm trying to learn more about video decoding overall some more exposure to the c would be good. |
👍
Yes, we need to set up parameters for a single line within a CTU. SSE can process 16 bytes at a time, AVX2 can handle 32 bytes, and AVX-512 can manage 64 bytes per operation.
You can start from https://www.amazon.com/Coding-Video-Practical-Guide-Beyond/dp/1118711785 :) |
Previously RANDCLIP(x, diff) was computing
x - diffand then clipping it between (0, max_pixel_val + rnd() % 2 * diff). This means we're not really generating a random value in the range.Instead compute (x - diff) + rnd() % 2 * diff. This returns a value such that abs(value - x) < diff.
This greatly improves the generation of strong deblocking data.