RGB2OPP is SLOW! #27

mysteryx93 · 2021-10-17T18:35:35Z

I've been running benchmark tests on my script (on 5K video clip)

BM3D                 parallel       167.75     136.73
Bicubic              parallel        41.23      33.61
RGB2OPP              parallel        31.54      25.70
VAggregate           parallel        31.12      25.36
Degrain3             parallel        30.58      24.93
KNLMeansCL           parreq          27.31      22.26

I wouldn't expect RGB2OPP to be way up there in the list! Above KNLMeansCL and above SMDegrain. Why is it so damn slow?

OPP gives quality gain, but when the entire script runs at .32fps instead of .44fps only for converting from YUV to RGB/OPP, I could set analysis settings higher instead.

The text was updated successfully, but these errors were encountered:

mawen1250 · 2021-10-18T07:42:11Z

RGB2OPP and OPP2RGB are mainly developed for the plugin to work on its own, but they are not very well optimized.
As in mvsfunc, mvf.BM3D uses mvf.ToYUV(matrix='OPP') and mvf.ToRGB(matrix='OPP') for the conversion. They call fmtc.matrix to do the job.

EleonoreMizo · 2021-10-18T09:11:08Z

BTW, I have a question about the OPP colorspace:
https://github.com/HomeOfVapourSynthEvolution/VapourSynth-BM3D/blob/master/include/Specification.h#L176-L192

It seems it is almost identical to YCgCo (diff: signs and chroma components swapped, different weights for Y), opposing red vs. blue and green vs. magenta, whereas usual opponent colors are more like red vs. green and blue vs. yellow. Other OPP definitions found on the web are consistent with R-G, R+G-2*B. It looks like the G and B components have been swapped in BM3D. I don’t know if it is on purpose or unintentional.

theChaosCoder · 2021-10-18T12:24:19Z

OT: There is no binary for r9 ^_^ EDIT: Sorry i should read more carefully!!!

mysteryx93 · 2021-10-19T18:45:50Z

Interestingly enough, using FMTC is even slightly slower than using RGB2OPP. Script doing simple YUV-RGB-OPP conversion back-and-forth with RGB2OPP on 5K clip gives 9fps, and with FMTC, 8fps.

mysteryx93 · 2021-10-22T01:48:15Z

Here. These functions are 7-12x faster than the other methods. Thanks to Godway.

def RGB_to_OPP (c: vs.VideoNode, fulls: bool = False) -> vs.VideoNode:
    if c.format.color_family != vs.RGB:
        raise TypeError("RGB_to_YCgCoR: Clip is not in RGB format!")

    bd = c.format.bits_per_sample
    R = core.std.ShufflePlanes(c, [0], vs.GRAY)
    G = core.std.ShufflePlanes(c, [1], vs.GRAY)
    B = core.std.ShufflePlanes(c, [2], vs.GRAY)

    b32 = "" if bd == 32 else "range_half +"

    O  = core.akarin.Expr([R, G, B], ex_dlut("x y z + + 0.333333333 *",     bd, fulls))
    P1 = core.akarin.Expr([R,    B], ex_dlut("x y - 0.5 * "+b32,            bd, fulls))
    P2 = core.akarin.Expr([R, G, B], ex_dlut("x z + 0.25 * y 0.5 * - "+b32, bd, fulls))

    return core.std.ShufflePlanes([O, P1, P2], [0, 0, 0], vs.YUV)


def OPP_to_RGB (c: vs.VideoNode, fulls: bool = False):
    if c.format.color_family != vs.YUV:
        raise TypeError("YCgCoR_to_RGB: Clip is not in YUV format!")

    bd = c.format.bits_per_sample
    O = core.std.ShufflePlanes(c, [0], vs.GRAY)
    P1 = core.std.ShufflePlanes(c, [1], vs.GRAY)
    P2 = core.std.ShufflePlanes(c, [2], vs.GRAY)

    b32 = "" if bd == 32 else "range_half -"

    R = core.akarin.Expr([O, P1, P2], ex_dlut("x y "+b32+" + z "+b32+" 0.666666666 * +", bd, fulls))
    G = core.akarin.Expr([O,     P2], ex_dlut("x y "+b32+" 1.333333333 * -",             bd, fulls))
    B = core.akarin.Expr([O, P1, P2], ex_dlut("x z "+b32+" 0.666666666 * + y "+b32+" -", bd, fulls))

    return core.std.ShufflePlanes([R, G, B], [0, 0, 0], vs.RGB)

# HBD constants 3D look up table
#
# * YUV and RGB mid-grey is 127.5 (rounded to 128) for PC range levels,
#   this translates to a value of 125.5 in TV range levels. Chroma is always centered, so 128 regardless.
def ex_dlut(expr: str = "", bits: int = 8, fulls: bool = False) -> str:
    bitd = \
        0 if bits == 8 else \
        1 if bits == 10 else \
        2 if bits == 12 else \
        3 if bits == 14 else \
        4 if bits == 16 else \
        5 if bits == 24 else \
        6 if bits == 32 else -1
    if bitd < 0:
        raise ValueError(f"ex_dlut: Unsupported bit depth ({bits})")
    
    #                 8-bit UINT      10-bit UINT          12-bit UINT          14-bit UINT            16-bit UINT         24-bit UINT               32-bit Ufloat
    range_min   = [  (  0.,  0.),    (   0.,   0.   ),    (   0.,   0.   ),    (    0.,    0.   ),    (    0.,    0.),    (       0.,       0.),    (       0.,       0.)   ]   [bitd]
    ymin        = [  ( 16., 16.),    (  64.,  64.   ),    ( 256., 257.   ),    ( 1024., 1028.   ),    ( 4096., 4112.),    ( 1048576., 1052672.),    (  16/255.,  16/255.)   ]   [bitd]
    cmin        = [  ( 16., 16.),    (  64.,  64.   ),    ( 256., 257.   ),    ( 1024., 1028.   ),    ( 4096., 4112.),    ( 1048576., 1052672.),    (  16/255.,  16/255.)   ]   [bitd]
    ygrey       = [  (126.,126.),    ( 502., 504.   ),    (2008.,2016.   ),    ( 8032., 8063.   ),    (32128.,32254.),    ( 8224768., 8256896.),    ( 125.5/255.,125.5/255.)]   [bitd]
    range_half  = [  (128.,128.),    ( 512., 514.   ),    (2048.,2056.   ),    ( 8192., 8224.   ),    (32768.,32896.),    ( 8388608., 8421376.),    ( 128/255., 128/255.)   ]   [bitd]
    yrange      = [  (219.,219.),    ( 876., 879.   ),    (3504.,3517.688),    (14016.,14070.750),    (56064.,56283.),    (14352384.,14408448.),    ( 219/255., 219/255.)   ]   [bitd]
    crange      = [  (224.,224.),    ( 896., 899.500),    (3584.,3598.   ),    (14336.,14392.   ),    (57344.,57568.),    (14680064.,14737408.),    ( 224/255., 224/255.)   ]   [bitd]
    ymax        = [  (235.,235.),    ( 940., 943.672),    (3760.,3774.688),    (15040.,15098.750),    (60160.,60395.),    (15400960.,15461120.),    ( 235/255., 235/255.)   ]   [bitd]
    cmax        = [  (240.,240.),    ( 960., 963.750),    (3840.,3855.   ),    (15360.,15420.   ),    (61440.,61680.),    (15728640.,15790080.),    ( 240/255., 240/255.)   ]   [bitd]
    range_max   = [  (255.,255.),    (1020.,1023.984),    (4080.,4095.938),    (16320.,16383.750),    (65280.,65535.),    (16711680.,16776960.),    (       1.,       1.)   ]   [bitd]
    range_size  = [  (256.,256.),    (1024.,1024.   ),    (4096.,4096.   ),    (16384.,16384.   ),    (65536.,65536.),    (16777216.,16777216.),    (       1.,       1.)   ]   [bitd]

    fs  = 1 if fulls else 0
    expr = expr.replace("ymax ymin - range_max /", str(yrange[fs]/range_max[fs]))
    expr = expr.replace("cmax cmin - range_max /", str(crange[fs]/range_max[fs]))
    expr = expr.replace("cmax ymin - range_max /", str(crange[fs]/range_max[fs]))
    expr = expr.replace("range_max ymax ymin - /", str(range_max[fs]/yrange[fs]))
    expr = expr.replace("range_max cmax cmin - /", str(range_max[fs]/crange[fs]))
    expr = expr.replace("range_max cmax ymin - /", str(range_max[fs]/crange[fs]))
    expr = expr.replace("ymax ymin -",             str(yrange[fs]))
    expr = expr.replace("cmax ymin -",             str(crange[fs]))
    expr = expr.replace("cmax cmin -",             str(crange[fs]))

    expr = expr.replace("ygrey",                   str(ygrey[fs]))
    expr = expr.replace("ymax",                    str(ymax[fs]))
    expr = expr.replace("cmax",                    str(cmax[fs]))
    expr = expr.replace("ymin",                    str(ymin[fs]))
    expr = expr.replace("cmin",                    str(cmin[fs]))
    expr = expr.replace("range_min",               str(range_min[fs]))
    expr = expr.replace("range_half",              str(range_half[fs]))
    expr = expr.replace("range_max",               str(range_max[fs]))
    expr = expr.replace("range_size",              str(range_size[fs]))
    return expr

mawen1250 · 2021-10-22T08:40:38Z

Interestingly enough, using FMTC is even slightly slower than using RGB2OPP. Script doing simple YUV-RGB-OPP conversion back-and-forth with RGB2OPP on 5K clip gives 9fps, and with FMTC, 8fps.

That's weird. DId you do the conversion in FP32 precision? I suppose FMTC is more optimized under INT16.

mawen1250 · 2021-10-22T08:49:52Z

ex_dlut

Nice work! I'd try it if I get the time.
BTW, if the source is YUV, computing a matrix to do the transform between YUV and OPP directly will furthur speed it up.

mysteryx93 · 2021-10-22T16:50:20Z

Yes... but I don't know anyone who knows the math to do it

NSQY mentioned this issue Feb 18, 2022

bm3d.OPP2RGB/RGB2OPP is slower than fmtc/expr Jaded-Encoding-Thaumaturgy/lvsfunc#83

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RGB2OPP is SLOW! #27

RGB2OPP is SLOW! #27

mysteryx93 commented Oct 17, 2021

mawen1250 commented Oct 18, 2021

EleonoreMizo commented Oct 18, 2021

theChaosCoder commented Oct 18, 2021 •

edited

Loading

mysteryx93 commented Oct 19, 2021

mysteryx93 commented Oct 22, 2021

mawen1250 commented Oct 22, 2021

mawen1250 commented Oct 22, 2021 •

edited

Loading

mysteryx93 commented Oct 22, 2021

RGB2OPP is SLOW! #27

RGB2OPP is SLOW! #27

Comments

mysteryx93 commented Oct 17, 2021

mawen1250 commented Oct 18, 2021

EleonoreMizo commented Oct 18, 2021

theChaosCoder commented Oct 18, 2021 • edited Loading

mysteryx93 commented Oct 19, 2021

mysteryx93 commented Oct 22, 2021

mawen1250 commented Oct 22, 2021

mawen1250 commented Oct 22, 2021 • edited Loading

mysteryx93 commented Oct 22, 2021

theChaosCoder commented Oct 18, 2021 •

edited

Loading

mawen1250 commented Oct 22, 2021 •

edited

Loading