AVX2 accelerated chroma deblocking filter #110

zackerthescar · 2023-07-12T10:32:50Z

This is a draft PR for the AVX2 accelerated deblocking filter that I wrote. I'm trying to get ff_vvc_v_loop_filter_8_avx2 to start off. Unfortunately, it currently does not pass checkasm. Hoping to get this done sooner than later. Any tips?

frankplow · 2023-07-15T13:03:24Z

I think this may be accidental: 9d697e2#diff-f0b18460e5f0f5478f7175369d062f49f897a1c58a57f39bc48ae502e21d25deL40-R40

QSXW · 2023-07-15T14:40:35Z

Hi. Can you add some performance data like https://github.com/ffvvc/FFmpeg/pull/69？

zackerthescar · 2023-07-18T19:53:13Z

Hey all, thanks for looking at this PR.

I think this may be accidental: https://github.com/ffvvc/FFmpeg/pull/110/files#diff-f0b18460e5f0f5478f7175369d062f49f897a1c58a57f39bc48ae502e21d25deL40-R40

Turns out this was! Accidentally pushed an experiment!

Hi. Can you add some performance data like #69

[FFmpeg] tests/checkasm/checkasm --test=vvc_deblock --benchmark                                                                                                                                         
benchmarking with native FFmpeg timers
nop: 46.6
checkasm: using random seed 4040587407
AVX2:
 - vvc_deblock.chroma [OK]
checkasm: all 2 tests passed
vvc_v_loop_filter_chroma8_c: 51.3
vvc_v_loop_filter_chroma8_avx2: 33.3
vvc_v_loop_filter_chroma10_c: 42.3
vvc_v_loop_filter_chroma10_avx2: 24.3

This is done with the current push. Unfortunately running any test files will cause either a segfault or some other issue.

When -pix_fmt designates a BE/LE pixel format, it gets translated into the native one by av_get_pix_fmt(). This may not always be the best choice, as the encoder might only support one endianness. In such a case, explicitly choose the endianness supported by the encoder. While this is currently redundant with choose_pixel_fmt() in ffmpeg_filter.c, the latter code will be deprecated in following commits.

This code works on encoder information and has no interaction with filtering, so it does not belong in ffmpeg_filter.

…formats

ffmpeg CLI pixel format selection for filtering currently special-cases MJPEG encoding, where it will restrict the supported list of pixel formats depending on the value of the -strict option. In order to get that value it will apply it from the options dict into the encoder context, which is a highly invasive action even now, and would become a race once encoding is moved to its own thread. The ugliness of this code can be much reduced by moving the special handling of MJPEG into ofilter_bind_ost(), which is called from encoder init and is thus synchronized with it. There is also no need to write anything to the encoder context, we can evaluate the option into our stack variable. There is also no need to access AVCodec at all during pixel format selection, as the pixel formats array is already stored in OutputFilterPriv.

This is more natural, as all except one of its callers require processing only one filtergraph.

…ons() This function assumes AVMEDIA_* are always positive, while in fact it can also handle AVMEDIA_TYPE_UNKNOWN, which is -1.

…ting

…aborting

…borting

Normal error handling does the job just as well.

…tead of aborting

…stead of aborting

…ead of aborting

… aborting

…nstead of aborting

… codes

…() instead of aborting

This does not require an arbitrary limit on the number of streams. Also, return error codes from opt_streamid() instead of aborting.

…_list_used_flag if sh_picture_header_in_slice_header_flag is true sh_lmcs_used_flag and sh_explicit_scaling_list_used_flag are infered from ph Failed clips: LMCS: CLM_A_KDDI_2.bit STILL444_A_KDDI_1.bit Scaling: SCALING_B_InterDigital_1.bit SCALING_A_InterDigital_1.bit

if pps_alf_info_in_ph_flag is true sh_alf_enabled_flag infered from ph Failed clip: LTRP_A_ERICSSON_3.bit

if !ph_deblocking_params_present_flag is true, ph_deblocking_filter_disabled_flag infered from pps if !sh_deblocking_params_present_flag is true, sh_deblocking_filter_disabled_flag infered from ph Failed clips: ENT444MAINTIER_C_Sony_3.bit ENT444HIGHTIER_D_Sony_3.bit

Failed clips: TILE_E_Nokia_2.bit TILE_D_Nokia_2.bit LMCS_A_Dolby_3.bit

…ts_l1

The executor design pattern was inroduced by java <https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/concurrent/Executor.html> it also adapted by python <https://docs.python.org/3/library/concurrent.futures.html> Compared to handcrafted thread pool management, it greatly simplifies the thread code.

add Context-based Adaptive Binary Arithmetic Coding (CABAC) decoder

This is the main entry point for the CTU (Coding Tree Unit) decoder. The code will divide the CTU decoder into several stages. It will check the stage dependencies and run the stage decoder.

vvc decoder plug-in to avcodec. split frames into slices/tiles and send them to vvc_thread for further decoding reorder and wait for the frame decoding to be done and output the frame Features: + Support I, P, B frames + Support 8/10/12 bits, chroma 400, 420, 422, and 444 and range extension + Support VVC new tools like MIP, CCLM, AFFINE, GPM, DMVR, PROF, BDOF, LMCS, ALF + 295 conformace clips passed - Not support RPR, IBC, PALETTE, and other minor features yet C code FPS on i7-12700: RitualDance_1920x1080_60_10_420_32_LD.266 129.7 Tango2_3840x2160_60_10_420_27_RA.266 26.7 RitualDance_1920x1080_60_10_420_37_RA.266 144.3 Tango2_3840x2160_60_10_420_27_LD.266 29.0 BQTerrace_1920x1080_60_10_420_22_RA.vvc 75.0 NovosobornayaSquare_1920x1080.bin 167.7 Chimera_8bit_1080P_1000_frames.vvc 155.3 Asm optimizations still working in progress. please check https://github.com/ffvvc/FFmpeg/wiki#performance-data for the latest Contributors(based on code merge order): Nuo Mi <nuomi2021@gmail.com> Xu Mu <toxumu@outlook.com> frankplow <post@frankplowman.com> Shaun Loo <shaunloo10@gmail.com>

nuomi2021 · 2023-08-12T14:55:45Z

@zackerthescar , the main branch switched.
could you help rebase this? and show the comandline how to reproduce the issue?

thank you.

zackerthescar · 2023-08-13T04:02:09Z

I spent a while today refactoring on the new codebase. This PR will now close and a new one will appear.

Thanks.

EDIT: Please see #120

nuomi2021 requested a review from QSXW July 13, 2023 03:33

zackerthescar force-pushed the main branch from 1292162 to fe99746 Compare July 17, 2023 20:42

elenril added 25 commits July 20, 2023 20:30

fftools/ffmpeg_filter: move "smart" pixfmt selection to ffmpeg_mux_init

037d364

This code works on encoder information and has no interaction with filtering, so it does not belong in ffmpeg_filter.

tests/fate: fix mismatches between requested and actually used pixel …

4503515

…formats

fftools/ffmpeg_filter: restrict reap_filters() to a single filtergraph

3a89e6d

This is more natural, as all except one of its callers require processing only one filtergraph.

fftools/ffmpeg_mux_init: avoid invalid memory access in set_dispositi…

c4b0746

…ons() This function assumes AVMEDIA_* are always positive, while in fact it can also handle AVMEDIA_TYPE_UNKNOWN, which is -1.

fftools/ffmpeg_enc: return errors from enc_frame() instead of aborting

80a6480

fftools/ffmpeg_enc: return errors from enc_open() instead of aborting

aa1173f

fftools/ffmpeg_enc: return errors from do_*_out() instead of aborting

dde3de0

fftools/ffmpeg_enc: return errors from enc_flush() instead of aborting

43bcf63

fftools/ffmpeg_enc: return errors from encode_frame() instead of abor…

2b4afe8

…ting

fftools/ffmpeg_mux: return errors from of_output_packet() instead of …

e0f4259

…aborting

fftools/ffmpeg_dec: return error codes from dec_packet() instead of a…

518b49a

…borting

fftools/ffmpeg_dec: drop redundant handling of AVERROR_EXPERIMENTAL

6298dd6

Normal error handling does the job just as well.

fftools/ffmpeg_filter: return error codes from ofilter_bind_ost() ins…

ab16e32

…tead of aborting

fftools/ffmpeg_filter: return error codes from init_input_filter() in…

a52ee1a

…stead of aborting

fftools/ffmpeg_filter: replace remaining exit_program() with error codes

8815adf

fftools/ffmpeg_filter: return error codes from choose_pix_fmts() inst…

8db9680

…ead of aborting

fftools/ffmpeg_filter: return error codes from fg_create() instead of…

5ba7aa2

… aborting

fftools/ffmpeg_filter: return error codes from set_channel_layout() i…

cb8242d

…nstead of aborting

fftools/ffmpeg_filter: replace remaining report_and_exit() with error…

13ebc9a

… codes

fftools/cmdutils: return error codes from setup_find_stream_info_opts…

37abb3a

…() instead of aborting

fftools/ffmpeg_opt: reimplement -streamid using a dictionary

26e1e80

This does not require an arbitrary limit on the number of streams. Also, return error codes from opt_streamid() instead of aborting.

fftools/cmdutils: drop unused ALLOC_ARRAY_ELEM()

8eb5ade

fftools/cmdutils: add error handling to allocate_array_elem()

6be4a29

nuomi2021 added 26 commits August 11, 2023 16:43

add github workflow

a0c669d

cbs_h266: fix inference for sh_alf_enabled_flag

c30ae3f

if pps_alf_info_in_ph_flag is true sh_alf_enabled_flag infered from ph Failed clip: LTRP_A_ERICSSON_3.bit

cbs_h266: fix slice_height_in_ctus for single slice tile

cd7d9b8

Failed clips: TILE_E_Nokia_2.bit TILE_D_Nokia_2.bit LMCS_A_Dolby_3.bit

cbs_h266: H266RawSliceHeader, expose NumEntryPoints

ad051bf

cbs_h266: H266RawPredWeightTable, expose num_weights_l0 and num_weigh…

09acd7b

…ts_l1

cbs_h266: H266RawSliceHeader, expose NumRefIdxActive[]

0cdeebd

cbs_h266: slice_header, fix inference for pred_weight_table

c366dec

vvcdec: add vvc decoder stub

9d8778f

vvcdec: add vvc_data

829b585

vvcdec: add parameter parser for sps, pps, ph, sh

44886d1

vvcdec: add cabac decoder

8425804

add Context-based Adaptive Binary Arithmetic Coding (CABAC) decoder

vvcdec: add reference management

c66cd14

vvcdec: add motion vector decoder

62c6d88

vvcdec: add inter prediction

9283818

vvcdec: add inv transform 1d

aef251a

vvcdec: add intra prediction

7b4254c

vvcdec: add LMCS, Deblocking, SAO, and ALF filters

9d46423

vvcdec: add dsp init and inv transform

737e02a

vvcdec: add CTU parser

fba776d

vvcdec: add CTU thread logical

552527b

This is the main entry point for the CTU (Coding Tree Unit) decoder. The code will divide the CTU decoder into several stages. It will check the stage dependencies and run the stage decoder.

vvcdec: add asm code

d53d9c3

vvcdec: add checkasm

4b51ec4

zackerthescar force-pushed the main branch from a572c3a to 4b51ec4 Compare August 13, 2023 04:02

zackerthescar closed this Aug 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AVX2 accelerated chroma deblocking filter #110

AVX2 accelerated chroma deblocking filter #110

Uh oh!

zackerthescar commented Jul 12, 2023

Uh oh!

frankplow commented Jul 15, 2023 •

edited

Loading

Uh oh!

QSXW commented Jul 15, 2023

Uh oh!

zackerthescar commented Jul 18, 2023 •

edited

Loading

Uh oh!

nuomi2021 commented Aug 12, 2023

Uh oh!

zackerthescar commented Aug 13, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

AVX2 accelerated chroma deblocking filter #110

AVX2 accelerated chroma deblocking filter #110

Uh oh!

Conversation

zackerthescar commented Jul 12, 2023

Uh oh!

frankplow commented Jul 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

QSXW commented Jul 15, 2023

Uh oh!

zackerthescar commented Jul 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nuomi2021 commented Aug 12, 2023

Uh oh!

zackerthescar commented Aug 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

frankplow commented Jul 15, 2023 •

edited

Loading

zackerthescar commented Jul 18, 2023 •

edited

Loading

zackerthescar commented Aug 13, 2023 •

edited

Loading