Skip to content

Conversation

@zackerthescar
Copy link
Contributor

This is a draft PR for the AVX2 accelerated deblocking filter that I wrote. I'm trying to get ff_vvc_v_loop_filter_8_avx2 to start off. Unfortunately, it currently does not pass checkasm. Hoping to get this done sooner than later. Any tips?

@nuomi2021 nuomi2021 requested a review from QSXW July 13, 2023 03:33
@frankplow
Copy link
Collaborator

frankplow commented Jul 15, 2023

@QSXW
Copy link
Collaborator

QSXW commented Jul 15, 2023

Hi. Can you add some performance data like https://github.com/ffvvc/FFmpeg/pull/69?

@zackerthescar
Copy link
Contributor Author

zackerthescar commented Jul 18, 2023

Hey all, thanks for looking at this PR.

I think this may be accidental: https://github.com/ffvvc/FFmpeg/pull/110/files#diff-f0b18460e5f0f5478f7175369d062f49f897a1c58a57f39bc48ae502e21d25deL40-R40

Turns out this was! Accidentally pushed an experiment!

Hi. Can you add some performance data like #69

[FFmpeg] tests/checkasm/checkasm --test=vvc_deblock --benchmark                                                                                                                                         
benchmarking with native FFmpeg timers
nop: 46.6
checkasm: using random seed 4040587407
AVX2:
 - vvc_deblock.chroma [OK]
checkasm: all 2 tests passed
vvc_v_loop_filter_chroma8_c: 51.3
vvc_v_loop_filter_chroma8_avx2: 33.3
vvc_v_loop_filter_chroma10_c: 42.3
vvc_v_loop_filter_chroma10_avx2: 24.3

This is done with the current push. Unfortunately running any test files will cause either a segfault or some other issue.

elenril added 25 commits July 20, 2023 20:30
When -pix_fmt designates a BE/LE pixel format, it gets translated into
the native one by av_get_pix_fmt(). This may not always be the best
choice, as the encoder might only support one endianness. In such a
case, explicitly choose the endianness supported by the encoder.

While this is currently redundant with choose_pixel_fmt() in
ffmpeg_filter.c, the latter code will be deprecated in following commits.
This code works on encoder information and has no interaction with
filtering, so it does not belong in ffmpeg_filter.
ffmpeg CLI pixel format selection for filtering currently special-cases
MJPEG encoding, where it will restrict the supported list of pixel
formats depending on the value of the -strict option. In order to get
that value it will apply it from the options dict into the encoder
context, which is a highly invasive action even now, and would become a
race once encoding is moved to its own thread.

The ugliness of this code can be much reduced by moving the special
handling of MJPEG into ofilter_bind_ost(), which is called from encoder
init and is thus synchronized with it. There is also no need to write
anything to the encoder context, we can evaluate the option into our
stack variable.

There is also no need to access AVCodec at all during pixel format
selection, as the pixel formats array is already stored in
OutputFilterPriv.
This is more natural, as all except one of its callers require
processing only one filtergraph.
…ons()

This function assumes AVMEDIA_* are always positive, while in fact it
can also handle AVMEDIA_TYPE_UNKNOWN, which is -1.
Normal error handling does the job just as well.
This does not require an arbitrary limit on the number of streams.

Also, return error codes from opt_streamid() instead of aborting.
…_list_used_flag

if sh_picture_header_in_slice_header_flag is true
sh_lmcs_used_flag and sh_explicit_scaling_list_used_flag are infered from ph

Failed clips:
LMCS: CLM_A_KDDI_2.bit STILL444_A_KDDI_1.bit
Scaling: SCALING_B_InterDigital_1.bit SCALING_A_InterDigital_1.bit
if pps_alf_info_in_ph_flag is true
sh_alf_enabled_flag infered from ph

Failed clip:
LTRP_A_ERICSSON_3.bit
if !ph_deblocking_params_present_flag is true, ph_deblocking_filter_disabled_flag infered from pps
if !sh_deblocking_params_present_flag is true, sh_deblocking_filter_disabled_flag infered from ph

Failed clips:
ENT444MAINTIER_C_Sony_3.bit
ENT444HIGHTIER_D_Sony_3.bit
Failed clips:
TILE_E_Nokia_2.bit
TILE_D_Nokia_2.bit
LMCS_A_Dolby_3.bit
The executor design pattern was inroduced by java
<https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/concurrent/Executor.html>
it also adapted by python
<https://docs.python.org/3/library/concurrent.futures.html>
Compared to handcrafted thread pool management, it greatly simplifies the thread code.
add Context-based Adaptive Binary Arithmetic Coding (CABAC) decoder
This is the main entry point for the CTU (Coding Tree Unit) decoder.
The code will divide the CTU decoder into several stages.
It will check the stage dependencies and run the stage decoder.
vvc decoder plug-in to avcodec.
split frames into slices/tiles and send them to vvc_thread for further decoding
reorder and wait for the frame decoding to be done and output the frame

Features:
    + Support I, P, B frames
    + Support 8/10/12 bits, chroma 400, 420, 422, and 444 and range extension
    + Support VVC new tools like MIP, CCLM, AFFINE, GPM, DMVR, PROF, BDOF, LMCS, ALF
    + 295 conformace clips passed
    - Not support RPR, IBC, PALETTE, and other minor features yet

C code FPS on i7-12700:
    RitualDance_1920x1080_60_10_420_32_LD.266       129.7
    Tango2_3840x2160_60_10_420_27_RA.266            26.7
    RitualDance_1920x1080_60_10_420_37_RA.266       144.3
    Tango2_3840x2160_60_10_420_27_LD.266            29.0
    BQTerrace_1920x1080_60_10_420_22_RA.vvc         75.0
    NovosobornayaSquare_1920x1080.bin               167.7
    Chimera_8bit_1080P_1000_frames.vvc              155.3

    Asm optimizations still working in progress. please check
    https://github.com/ffvvc/FFmpeg/wiki#performance-data for the latest

Contributors(based on code merge order):
    Nuo Mi <nuomi2021@gmail.com>
    Xu Mu <toxumu@outlook.com>
    frankplow <post@frankplowman.com>
    Shaun Loo <shaunloo10@gmail.com>
@nuomi2021
Copy link
Member

@zackerthescar , the main branch switched.
could you help rebase this? and show the comandline how to reproduce the issue?

thank you.

@zackerthescar
Copy link
Contributor Author

zackerthescar commented Aug 13, 2023

I spent a while today refactoring on the new codebase. This PR will now close and a new one will appear.

Thanks.

EDIT: Please see #120

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.