-
Notifications
You must be signed in to change notification settings - Fork 12
AVX2 accelerated chroma deblocking filter #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I think this may be accidental: 9d697e2#diff-f0b18460e5f0f5478f7175369d062f49f897a1c58a57f39bc48ae502e21d25deL40-R40 |
|
Hi. Can you add some performance data like https://github.com/ffvvc/FFmpeg/pull/69? |
|
Hey all, thanks for looking at this PR.
Turns out this was! Accidentally pushed an experiment!
This is done with the current push. Unfortunately running any test files will cause either a segfault or some other issue. |
When -pix_fmt designates a BE/LE pixel format, it gets translated into the native one by av_get_pix_fmt(). This may not always be the best choice, as the encoder might only support one endianness. In such a case, explicitly choose the endianness supported by the encoder. While this is currently redundant with choose_pixel_fmt() in ffmpeg_filter.c, the latter code will be deprecated in following commits.
This code works on encoder information and has no interaction with filtering, so it does not belong in ffmpeg_filter.
ffmpeg CLI pixel format selection for filtering currently special-cases MJPEG encoding, where it will restrict the supported list of pixel formats depending on the value of the -strict option. In order to get that value it will apply it from the options dict into the encoder context, which is a highly invasive action even now, and would become a race once encoding is moved to its own thread. The ugliness of this code can be much reduced by moving the special handling of MJPEG into ofilter_bind_ost(), which is called from encoder init and is thus synchronized with it. There is also no need to write anything to the encoder context, we can evaluate the option into our stack variable. There is also no need to access AVCodec at all during pixel format selection, as the pixel formats array is already stored in OutputFilterPriv.
This is more natural, as all except one of its callers require processing only one filtergraph.
…ons() This function assumes AVMEDIA_* are always positive, while in fact it can also handle AVMEDIA_TYPE_UNKNOWN, which is -1.
Normal error handling does the job just as well.
…stead of aborting
…nstead of aborting
…() instead of aborting
This does not require an arbitrary limit on the number of streams. Also, return error codes from opt_streamid() instead of aborting.
…_list_used_flag if sh_picture_header_in_slice_header_flag is true sh_lmcs_used_flag and sh_explicit_scaling_list_used_flag are infered from ph Failed clips: LMCS: CLM_A_KDDI_2.bit STILL444_A_KDDI_1.bit Scaling: SCALING_B_InterDigital_1.bit SCALING_A_InterDigital_1.bit
if pps_alf_info_in_ph_flag is true sh_alf_enabled_flag infered from ph Failed clip: LTRP_A_ERICSSON_3.bit
if !ph_deblocking_params_present_flag is true, ph_deblocking_filter_disabled_flag infered from pps if !sh_deblocking_params_present_flag is true, sh_deblocking_filter_disabled_flag infered from ph Failed clips: ENT444MAINTIER_C_Sony_3.bit ENT444HIGHTIER_D_Sony_3.bit
Failed clips: TILE_E_Nokia_2.bit TILE_D_Nokia_2.bit LMCS_A_Dolby_3.bit
The executor design pattern was inroduced by java <https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/concurrent/Executor.html> it also adapted by python <https://docs.python.org/3/library/concurrent.futures.html> Compared to handcrafted thread pool management, it greatly simplifies the thread code.
add Context-based Adaptive Binary Arithmetic Coding (CABAC) decoder
This is the main entry point for the CTU (Coding Tree Unit) decoder. The code will divide the CTU decoder into several stages. It will check the stage dependencies and run the stage decoder.
vvc decoder plug-in to avcodec.
split frames into slices/tiles and send them to vvc_thread for further decoding
reorder and wait for the frame decoding to be done and output the frame
Features:
+ Support I, P, B frames
+ Support 8/10/12 bits, chroma 400, 420, 422, and 444 and range extension
+ Support VVC new tools like MIP, CCLM, AFFINE, GPM, DMVR, PROF, BDOF, LMCS, ALF
+ 295 conformace clips passed
- Not support RPR, IBC, PALETTE, and other minor features yet
C code FPS on i7-12700:
RitualDance_1920x1080_60_10_420_32_LD.266 129.7
Tango2_3840x2160_60_10_420_27_RA.266 26.7
RitualDance_1920x1080_60_10_420_37_RA.266 144.3
Tango2_3840x2160_60_10_420_27_LD.266 29.0
BQTerrace_1920x1080_60_10_420_22_RA.vvc 75.0
NovosobornayaSquare_1920x1080.bin 167.7
Chimera_8bit_1080P_1000_frames.vvc 155.3
Asm optimizations still working in progress. please check
https://github.com/ffvvc/FFmpeg/wiki#performance-data for the latest
Contributors(based on code merge order):
Nuo Mi <nuomi2021@gmail.com>
Xu Mu <toxumu@outlook.com>
frankplow <post@frankplowman.com>
Shaun Loo <shaunloo10@gmail.com>
|
@zackerthescar , the main branch switched. thank you. |
|
I spent a while today refactoring on the new codebase. This PR will now close and a new one will appear. Thanks. EDIT: Please see #120 |
This is a draft PR for the AVX2 accelerated deblocking filter that I wrote. I'm trying to get
ff_vvc_v_loop_filter_8_avx2to start off. Unfortunately, it currently does not passcheckasm. Hoping to get this done sooner than later. Any tips?