-
Notifications
You must be signed in to change notification settings - Fork 407
Conv2D direct support #744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Just a thought, wouldn't it be better to just add a |
This usually means more than ~4G. I'd suggest testing with something like 576x1024, or maybe 640x1024.
Adding up on that, a new overloaded constructor could help avoiding all the new optional parameters on the call sites; eg. something like: Conv2d(int64_t in_channels,
int64_t out_channels,
bool direct,
std::pair<int, int> kernel_size,
(...) (although this is already asking for some kind of named parameter idiom). |
Doing it right now. I just wanted it more explicit to easily test the changes. |
I have reduced the duplicated code and added a check that should only allow the direct mode on vulkan and cpu. |
Testing f5b5f5c : ggml_vulkan: Found 1 Vulkan devices: SDXL:
SD1.5:
|
@Green-Sky what do you think about how I implemented the flags? It's still a WIP, I think I may have missed some Conv2D uses. |
Looks good.
|
Do you think the |
I think Flux Models only use Conv2D in the VAE. |
I am not very familiar with those, but probably.
It does not use conv2d, last time I checked. (I also included it in the benchmarks I did here) |
This still probably needs some cleanup and I still may have missed some cases but it's almost ready. |
I think it would be better to automatically select whether to use ggml_conv2d or ggml_conv2d_direct within ggml_nn_conv_2d based on the shapes and sizes of the input and weight, as well as whether the backend supports it. |
I've just tested this PR with the latest changes from the new llama.cpp's conv2d PR and it seems like the slowdown in the diffusion process using conv2d direct is almost gone (it's still slightly slower but it's minimal). Applying conv2d direct to the upscaling process seems quite helpful. conv2d
conv2d direct
|
The idea around this PR is to let the choice, for now, to the user given the fact that Conv2D direct is new and there's not enough data to know if it's going to be faster for everyone or if it may cause regressions. When it will be sure that this will be an improvement for everyone the logic can be moved in ggml-extend.cpp to automatically select it. Isn't it a safer choice, at least for now? |
I agree. Current state: CPU, generally slower in my tests, including vae.
This is still hard, I think. This should probably be done in the ggml backends itself. |
I updated the code and used a more concise approach to enable conv2d_direct. |
That's definitely way cleaner, thanks. |
Thank you for your contribution. |
This PR adds two runtime flags to enable the new Conv2D Direct implementation (#739) on the diffusion model and on the vae model.
This new implementation improves VAE decoding speed and memory usage.
Tested on a Ryzen 7 4700U (Vega 7 iGPU)
Master
SD 1.5 - 512 x 512 = vae compute buffer size: 1664.00 MB(VRAM), 20.0s
SD XL - 1024 x1024 = ErrorOutOfDeviceMemory
PR
SD 1.5 - 512 x 512 = vae compute buffer size: 640.00 MB(VRAM), 6.13s
SD XL - 1024 x 1024 = vae compute buffer size: 2560.00 MB(VRAM), 26.78s
Unfortunately I won't be home for a while so my only machine to test the changes is this laptop. If anyone could post their tests here it would help a lot.