ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel #3891

slaren · 2023-11-01T18:32:16Z

Computes the pointers in kernel so that this can be done asynchronously, and uses the CUDA pool to avoid the call to cudaMalloc.

Fixes #3884

slaren · 2023-11-01T18:34:23Z

Ideally, we would also launch cuBLAS from the same kernel, but it seems that using device-side cuBLAS would require some changes to the build system.

model	size	params	backend	ngl	test	t/s
mistral 7B mostly Q8_0	7.17 GiB	7.24 B	CUDA	99	pp 512	3673.15 ± 55.61
mistral 7B mostly Q8_0	7.17 GiB	7.24 B	CUDA	99	tg 512	79.59 ± 0.34

build: a9ab02e (1456)

ggerganov

Tested and it also works on the GTX 1660 which was crashing when using the mem pool + memcpy

…ov#3891) * ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel * fix warnings (cherry picked from commit d02e98c)

…ov#3891) * ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel * fix warnings

ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel

a9ab02e

fix warnings

1354122

ggerganov approved these changes Nov 1, 2023

View reviewed changes

slaren merged commit d02e98c into master Nov 1, 2023
33 checks passed

slaren deleted the batched-krn branch November 1, 2023 22:10

slaren mentioned this pull request Nov 2, 2023

Fix ROCM build by relaxing constness #3895

Closed

olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023

ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel (ggergan…

cd9807c

…ov#3891) * ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel * fix warnings

lilpoozie2005 approved these changes Sep 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel #3891

ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel #3891

slaren commented Nov 1, 2023

slaren commented Nov 1, 2023 •

edited

Loading

ggerganov left a comment

ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel #3891

ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel #3891

Conversation

slaren commented Nov 1, 2023

slaren commented Nov 1, 2023 • edited Loading

ggerganov left a comment

Choose a reason for hiding this comment

slaren commented Nov 1, 2023 •

edited

Loading