Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RGB2OPP is SLOW! #27

Open
mysteryx93 opened this issue Oct 17, 2021 · 8 comments
Open

RGB2OPP is SLOW! #27

mysteryx93 opened this issue Oct 17, 2021 · 8 comments

Comments

@mysteryx93
Copy link

I've been running benchmark tests on my script (on 5K video clip)

BM3D                 parallel       167.75     136.73
Bicubic              parallel        41.23      33.61
RGB2OPP              parallel        31.54      25.70
VAggregate           parallel        31.12      25.36
Degrain3             parallel        30.58      24.93
KNLMeansCL           parreq          27.31      22.26

I wouldn't expect RGB2OPP to be way up there in the list! Above KNLMeansCL and above SMDegrain. Why is it so damn slow?

OPP gives quality gain, but when the entire script runs at .32fps instead of .44fps only for converting from YUV to RGB/OPP, I could set analysis settings higher instead.

@mawen1250
Copy link
Member

RGB2OPP and OPP2RGB are mainly developed for the plugin to work on its own, but they are not very well optimized.
As in mvsfunc, mvf.BM3D uses mvf.ToYUV(matrix='OPP') and mvf.ToRGB(matrix='OPP') for the conversion. They call fmtc.matrix to do the job.

@EleonoreMizo
Copy link

BTW, I have a question about the OPP colorspace:
https://github.com/HomeOfVapourSynthEvolution/VapourSynth-BM3D/blob/master/include/Specification.h#L176-L192

It seems it is almost identical to YCgCo (diff: signs and chroma components swapped, different weights for Y), opposing red vs. blue and green vs. magenta, whereas usual opponent colors are more like red vs. green and blue vs. yellow. Other OPP definitions found on the web are consistent with R-G, R+G-2*B. It looks like the G and B components have been swapped in BM3D. I don’t know if it is on purpose or unintentional.

@theChaosCoder
Copy link

theChaosCoder commented Oct 18, 2021

OT: There is no binary for r9 ^_^ EDIT: Sorry i should read more carefully!!!

@mysteryx93
Copy link
Author

Interestingly enough, using FMTC is even slightly slower than using RGB2OPP. Script doing simple YUV-RGB-OPP conversion back-and-forth with RGB2OPP on 5K clip gives 9fps, and with FMTC, 8fps.

@mysteryx93
Copy link
Author

Here. These functions are 7-12x faster than the other methods. Thanks to Godway.

def RGB_to_OPP (c: vs.VideoNode, fulls: bool = False) -> vs.VideoNode:
    if c.format.color_family != vs.RGB:
        raise TypeError("RGB_to_YCgCoR: Clip is not in RGB format!")

    bd = c.format.bits_per_sample
    R = core.std.ShufflePlanes(c, [0], vs.GRAY)
    G = core.std.ShufflePlanes(c, [1], vs.GRAY)
    B = core.std.ShufflePlanes(c, [2], vs.GRAY)

    b32 = "" if bd == 32 else "range_half +"

    O  = core.akarin.Expr([R, G, B], ex_dlut("x y z + + 0.333333333 *",     bd, fulls))
    P1 = core.akarin.Expr([R,    B], ex_dlut("x y - 0.5 * "+b32,            bd, fulls))
    P2 = core.akarin.Expr([R, G, B], ex_dlut("x z + 0.25 * y 0.5 * - "+b32, bd, fulls))

    return core.std.ShufflePlanes([O, P1, P2], [0, 0, 0], vs.YUV)


def OPP_to_RGB (c: vs.VideoNode, fulls: bool = False):
    if c.format.color_family != vs.YUV:
        raise TypeError("YCgCoR_to_RGB: Clip is not in YUV format!")

    bd = c.format.bits_per_sample
    O = core.std.ShufflePlanes(c, [0], vs.GRAY)
    P1 = core.std.ShufflePlanes(c, [1], vs.GRAY)
    P2 = core.std.ShufflePlanes(c, [2], vs.GRAY)

    b32 = "" if bd == 32 else "range_half -"

    R = core.akarin.Expr([O, P1, P2], ex_dlut("x y "+b32+" + z "+b32+" 0.666666666 * +", bd, fulls))
    G = core.akarin.Expr([O,     P2], ex_dlut("x y "+b32+" 1.333333333 * -",             bd, fulls))
    B = core.akarin.Expr([O, P1, P2], ex_dlut("x z "+b32+" 0.666666666 * + y "+b32+" -", bd, fulls))

    return core.std.ShufflePlanes([R, G, B], [0, 0, 0], vs.RGB)

# HBD constants 3D look up table
#
# * YUV and RGB mid-grey is 127.5 (rounded to 128) for PC range levels,
#   this translates to a value of 125.5 in TV range levels. Chroma is always centered, so 128 regardless.
def ex_dlut(expr: str = "", bits: int = 8, fulls: bool = False) -> str:
    bitd = \
        0 if bits == 8 else \
        1 if bits == 10 else \
        2 if bits == 12 else \
        3 if bits == 14 else \
        4 if bits == 16 else \
        5 if bits == 24 else \
        6 if bits == 32 else -1
    if bitd < 0:
        raise ValueError(f"ex_dlut: Unsupported bit depth ({bits})")
    
    #                 8-bit UINT      10-bit UINT          12-bit UINT          14-bit UINT            16-bit UINT         24-bit UINT               32-bit Ufloat
    range_min   = [  (  0.,  0.),    (   0.,   0.   ),    (   0.,   0.   ),    (    0.,    0.   ),    (    0.,    0.),    (       0.,       0.),    (       0.,       0.)   ]   [bitd]
    ymin        = [  ( 16., 16.),    (  64.,  64.   ),    ( 256., 257.   ),    ( 1024., 1028.   ),    ( 4096., 4112.),    ( 1048576., 1052672.),    (  16/255.,  16/255.)   ]   [bitd]
    cmin        = [  ( 16., 16.),    (  64.,  64.   ),    ( 256., 257.   ),    ( 1024., 1028.   ),    ( 4096., 4112.),    ( 1048576., 1052672.),    (  16/255.,  16/255.)   ]   [bitd]
    ygrey       = [  (126.,126.),    ( 502., 504.   ),    (2008.,2016.   ),    ( 8032., 8063.   ),    (32128.,32254.),    ( 8224768., 8256896.),    ( 125.5/255.,125.5/255.)]   [bitd]
    range_half  = [  (128.,128.),    ( 512., 514.   ),    (2048.,2056.   ),    ( 8192., 8224.   ),    (32768.,32896.),    ( 8388608., 8421376.),    ( 128/255., 128/255.)   ]   [bitd]
    yrange      = [  (219.,219.),    ( 876., 879.   ),    (3504.,3517.688),    (14016.,14070.750),    (56064.,56283.),    (14352384.,14408448.),    ( 219/255., 219/255.)   ]   [bitd]
    crange      = [  (224.,224.),    ( 896., 899.500),    (3584.,3598.   ),    (14336.,14392.   ),    (57344.,57568.),    (14680064.,14737408.),    ( 224/255., 224/255.)   ]   [bitd]
    ymax        = [  (235.,235.),    ( 940., 943.672),    (3760.,3774.688),    (15040.,15098.750),    (60160.,60395.),    (15400960.,15461120.),    ( 235/255., 235/255.)   ]   [bitd]
    cmax        = [  (240.,240.),    ( 960., 963.750),    (3840.,3855.   ),    (15360.,15420.   ),    (61440.,61680.),    (15728640.,15790080.),    ( 240/255., 240/255.)   ]   [bitd]
    range_max   = [  (255.,255.),    (1020.,1023.984),    (4080.,4095.938),    (16320.,16383.750),    (65280.,65535.),    (16711680.,16776960.),    (       1.,       1.)   ]   [bitd]
    range_size  = [  (256.,256.),    (1024.,1024.   ),    (4096.,4096.   ),    (16384.,16384.   ),    (65536.,65536.),    (16777216.,16777216.),    (       1.,       1.)   ]   [bitd]

    fs  = 1 if fulls else 0
    expr = expr.replace("ymax ymin - range_max /", str(yrange[fs]/range_max[fs]))
    expr = expr.replace("cmax cmin - range_max /", str(crange[fs]/range_max[fs]))
    expr = expr.replace("cmax ymin - range_max /", str(crange[fs]/range_max[fs]))
    expr = expr.replace("range_max ymax ymin - /", str(range_max[fs]/yrange[fs]))
    expr = expr.replace("range_max cmax cmin - /", str(range_max[fs]/crange[fs]))
    expr = expr.replace("range_max cmax ymin - /", str(range_max[fs]/crange[fs]))
    expr = expr.replace("ymax ymin -",             str(yrange[fs]))
    expr = expr.replace("cmax ymin -",             str(crange[fs]))
    expr = expr.replace("cmax cmin -",             str(crange[fs]))

    expr = expr.replace("ygrey",                   str(ygrey[fs]))
    expr = expr.replace("ymax",                    str(ymax[fs]))
    expr = expr.replace("cmax",                    str(cmax[fs]))
    expr = expr.replace("ymin",                    str(ymin[fs]))
    expr = expr.replace("cmin",                    str(cmin[fs]))
    expr = expr.replace("range_min",               str(range_min[fs]))
    expr = expr.replace("range_half",              str(range_half[fs]))
    expr = expr.replace("range_max",               str(range_max[fs]))
    expr = expr.replace("range_size",              str(range_size[fs]))
    return expr

@mawen1250
Copy link
Member

Interestingly enough, using FMTC is even slightly slower than using RGB2OPP. Script doing simple YUV-RGB-OPP conversion back-and-forth with RGB2OPP on 5K clip gives 9fps, and with FMTC, 8fps.

That's weird. DId you do the conversion in FP32 precision? I suppose FMTC is more optimized under INT16.

@mawen1250
Copy link
Member

mawen1250 commented Oct 22, 2021

ex_dlut

Nice work! I'd try it if I get the time.
BTW, if the source is YUV, computing a matrix to do the transform between YUV and OPP directly will furthur speed it up.

@mysteryx93
Copy link
Author

Yes... but I don't know anyone who knows the math to do it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants