-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conv2D kernel CuBLAS implementation - need feedback #556
Comments
I think you have an outdated version of ggml-cuda. Lines 6916 to 6918 in 8b5c564
Lines 5882 to 5901 in 8b5c564
It would be good to reuse the current matrix multiplication kernels instead of adding another one. |
Compiling the latest version of ggml-cublas I get this error: D:\proyectos\cpp-projects\stable-diffusion.cpp\ggml\src\ggml-cuda.cu(7772): error : expected an expression [D:\proyectos\cpp-project
s\stable-diffusion.cpp\build\ggml\src\ggml.vcxproj]
*cuda_backend = (ggml_backend){
^ ggml_backend_t cuda_backend = new ggml_backend;
*cuda_backend = (ggml_backend){
/* .interface = */ cuda_backend_i,
/* .context = */ ctx
}; In the version of the code that you mentioned |
You need to update the rest of ggml as well. |
I had updated full ggml repository |
You are using the old headers at least. These errors indicate that Edit: may also be an issue with MSVC. |
Does this fix the issue? --- a/src/ggml-cuda.cu
+++ b/src/ggml-cuda.cu
@@ -7768,8 +7768,7 @@ ggml_backend_t ggml_backend_cuda_init() {
ggml_backend_context_cuda * ctx = new ggml_backend_context_cuda;
- ggml_backend_t cuda_backend = new ggml_backend;
- *cuda_backend = (ggml_backend){
+ ggml_backend_t cuda_backend = new ggml_backend {
/* .interface = */ cuda_backend_i,
/* .context = */ ctx
}; |
Yes, but I get these errors: D:\proyectos\cpp-projects\stable-diffusion.cpp\ggml\src\..\include\ggml\ggml-backend.h(34,38): error C2236: unexpected token 'struct
'. check if you forget ';' [D:\proyectos\cpp-projects\stable-diffusion.cpp\build\ggml\src\ggml.vcxproj]
D:\proyectos\cpp-projects\stable-diffusion.cpp\ggml\src\..\include\ggml\ggml-backend.h(34,47): error C2332: 'struct': falta el nombr
e de etiqueta [D:\proyectos\cpp-projects\stable-diffusion.cpp\build\ggml\src\ggml.vcxproj]
D:\proyectos\cpp-projects\stable-diffusion.cpp\ggml\src\..\include\ggml\ggml-backend.h(34,47): error C2027: use of type'<unnamed-t
ag>' undefined [D:\proyectos\cpp-projects\stable-diffusion.cpp\build\ggml\src\ggml.vcxproj]
D:\proyectos\cpp-projects\stable-diffusion.cpp\ggml\src\..\include\ggml\ggml-backend.h(45,52): error C2236: token inesperado 'struct
'. Compruebe si olvidó un ';' [D:\proyectos\cpp-projects\stable-diffusion.cpp\build\ggml\src\ggml.vcxproj]
D:\proyectos\cpp-projects\stable-diffusion.cpp\ggml\src\..\include\ggml\ggml-backend.h(45,61): error C2332: 'struct': missing tag name [D:\proyectos\cpp-projects\stable-diffusion.cpp\build\ggml\src\ggml.vcxproj]
D:\proyectos\cpp-projects\stable-diffusion.cpp\ggml\src\..\include\ggml\ggml-backend.h(96,31): error C2236: token inesperado 'struct
'. Compruebe si olvidó un ';' [D:\proyectos\cpp-projects\stable-diffusion.cpp\build\ggml\src\ggml.vcxproj]
D:\proyectos\cpp-projects\stable-diffusion.cpp\ggml\src\..\include\ggml\ggml-backend.h(96,40): error C2332: 'struct': falta el nombr
e de etiqueta [D:\proyectos\cpp-projects\stable-diffusion.cpp\build\ggml\src\ggml.vcxproj]
D:\proyectos\cpp-projects\stable-diffusion.cpp\ggml\src\..\include\ggml\ggml-backend.h(96,40): error C2027: uso del tipo '<unnamed-t
ag>' sin definir [D:\proyectos\cpp-projects\stable-diffusion.cpp\build\ggml\src\ggml.vcxproj] |
Apparently, MSVC defines #ifdef interface
#undef interface
#endif |
Works! |
I haven't had the chance to test if the kernel addition works because the latest version of ggml doesn't have Conv2D Stage 0 and Stage 1 implemented. Trying to reimplement everything in the latest version of ggml didn't work for me, as the program just crashes. I'll have to wait for the developer of stable-diffusion.cpp to update the ggml version, and then I can add the CUDA implementation. For that reason, I first tried to implement it in the old version of ggml-cuda. Nevertheless, I'm not sure if enabling GGML_CUBLAS sets the CUDA backend. How to perform the compute in GPU in the old ggml-cuda? @slaren I'm trying this: int KW = 3, KH = 3, IC = 640, OC = 640;
int IW = 32, IH = 48, /* IC = 640 */ N = 1;
struct ggml_tensor* ha = ggml_new_tensor_4d(ctx, GGML_TYPE_F16, KW, KH, IC, OC);
memcpy(ha->data, hadata, KW * KH * IC * OC * sizeof(uint16_t));
struct ggml_tensor* b = ggml_new_tensor_4d(ctx, GGML_TYPE_F32, IW, IH, IC, N);
memcpy(b->data, bdata, IW * IH * IC * N * sizeof(float));
struct ggml_tensor* result = ggml_conv_2d(ctx, ha, b, s0, s1, p0, p1, d0, d1);
result->backend = GGML_BACKEND_GPU; // This perform the op in gpu???
ggml_set_name(result, "Result Tensor");
struct ggml_cgraph gf = ggml_build_forward(result);
ggml_graph_compute_with_ctx(ctx, &gf, 6);
const float* ref = (float*)(result->data);
printf("conv2d:\n%.2f %.2f %.2f %.2f\n%.2f %.2f %.2f %.2f\n",
ref[0], ref[1], ref[2],
ref[3], ref[4], ref[5],
ref[6], ref[7]); Error: ggml_cuda_op:
ne03: 640 ne13: 1
GGML_ASSERT: D:\proyectos\cpp-projects\ggml-test\ggml\ggml-cuda.cu:5773: ne03 == ne13 The backend api is so confusing! |
There are two ways to use the CUDA backend, with the old API or the ggml-backend API: The old API is more flexible and supports things such as only offloading part of a model to VRAM, and mixing CPU and GPU computation for operations that aren't supported yet in CUDA, and partially offloaded models. I think only llama.cpp uses it fully currently. The ggml-backend is a recent addition that intends to provide a common API to use all the CPU and GPU backends. Currently, it only supports fully offloading all the computation to the GPU, and that requires all the weights to be stored VRAM, and all the operations must be implemented in the CUDA backend. In the future it will be extended to support partial offloading, but it is not ready yet. You can find an example of how to use it in the If you have enough VRAM to fully offload the model, you should try ggml-backend, it will be much easier to use than the old API. If you have more questions add a new reply, I don't get pinged when you edit your comments. |
Context
In the last few days, I've been working on creating a Conv2D kernel for the "sd.cpp" project. I already have the kernel created, but when trying to implement it in "ggml," I've encountered a limitation where the data passed to the GPU must be in FP32 format, but the current CPU implementation of Conv2D requires FP16.
Here is my repo of the results: https://github.com/FSSRepo/ggml-cuda-experiments
Trying to implement the kernel in GGML CUDA:
Working in the file
ggml-cuda.cu
:Add cuda kernels:
Add op functions in
ggml_cuda_compute_forward
:Creating
ggml_cuda_conv2d_stage_0
andggml_cuda_conv2d_stage_1
cuda functions:Creating cuda ops
ggml_cuda_op_conv2d_stage_0
andggml_cuda_op_conv2d_stage_1
, I need feedback in this section, what should I do?:Compiler Error:
The text was updated successfully, but these errors were encountered: