Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu-next: using VK_KHR_cooperative_matrix extension #12144

Closed
ghost opened this issue Aug 12, 2023 · 4 comments
Closed

gpu-next: using VK_KHR_cooperative_matrix extension #12144

ghost opened this issue Aug 12, 2023 · 4 comments
Labels
down-upstream features and bugs that need to be implemented and fixed upstream meta:feature-request

Comments

@ghost
Copy link

ghost commented Aug 12, 2023

No description provided.

@ghost ghost added the meta:feature-request label Aug 12, 2023
@bjin
Copy link
Contributor

bjin commented Aug 13, 2023

Just to be clear, adding support of this to mpv or libplacebo is the least important blocking issue. It could be as simple as adding a "#EXTENSION" directive to shader prelude (probably via "//!EXTENTION").

Utilizing this extension (in user shader), however, is quite complicated. This is especially true for CNN shaders like FSRCNNx and Anime4k. It basically means writing a new shader from scratch with 10x complexity (compute shader, subgroup, buffer storage, batch processing, fp16 ...). And even if all these are done, There are different subgroup-size/coopMatMul kernel size available from different vendor implementation, and their performance will vary between different GPUs. Modern DL framework like pytorch and tensorflow will actually compare and choose different kernel at runtime for best performance.

So, instead of opening meaningless feature request here, you should probably go to those repos and open FR there.

side story:

I thought about using this extension in my nnedi3 shader, because it's a much simpler case: single layer and kernel size (8x4 and 8x6) happens to be multiple of 16 so no routines needed for leftovers. But it still requires a lot of effort, and probably too much for a somehow outdated model like nnedi3.

@sfan5 sfan5 added the down-upstream features and bugs that need to be implemented and fixed upstream label Aug 13, 2023
@sfan5 sfan5 closed this as completed Aug 13, 2023
@bjin
Copy link
Contributor

bjin commented Aug 13, 2023

But libplacebo's built-in shaders may still benefit from it.

No, those shaders won't. Fast matrix multiplication only benefits massive convolution kernel with large number of input channels (8 to be precise).

@cyanreg
Copy link
Contributor

cyanreg commented Aug 13, 2023

It could benefit dither generation, since you can create whatever noise pattern you'd like in the frequency domain and then do a DCT to get a spatial rep.

But, outside of libplacebo, it could benefit some cases like denoising, or any sort of frequency domain block processing. And, of course, it could be used in its intended use in a neural network.
It is very platform-specific though, since AFAIK the matrix sizes on Nvidia and AMD differ, as well as the matrix types.

@haasn
Copy link
Member

haasn commented Aug 13, 2023

I would definitely be interested in a fast full image DCT implementation. Especially in combination with a FREQUENCY/PHASE hook stage to allow user shaders to apply arbitrary transformations to the image in the fourier domain.

Could potentially use this e.g. instead of cascading gaussian blurs for very large blur factors, and maybe for extreme downscaling (16x or more) which is prohibitively expensive with conventional convolution (especially when the convolution kernel is unrolled), obviously denoising, some types of film grain generation, full frame blue noise dithering, etc...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
down-upstream features and bugs that need to be implemented and fixed upstream meta:feature-request
Projects
None yet
Development

No branches or pull requests

4 participants