Skip to content

b5970

Compare
Choose a tag to compare
@github-actions github-actions released this 23 Jul 11:32
CUDA: fix quantized KV cache + multiple sequences (#14822)

* CUDA: fix quantized KV cache + multiple sequences

* Update ggml/src/ggml-cuda/fattn-common.cuh

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>