fix embeddings when using CUDA #3657

slaren · 2023-10-17T18:36:51Z

* 'master' of github.com:ggerganov/llama.cpp: fix embeddings when using CUDA (ggerganov#3657) llama : avoid fprintf in favor of LLAMA_LOG (ggerganov#3538) readme : update hot-topics & models, detail windows release in usage (ggerganov#3615) CLBlast: Fix temporary buffer size for f16 conversion (wsize) train-text-from-scratch : fix assert failure in ggml-alloc (ggerganov#3618) editorconfig : remove trailing spaces server : documentation of JSON return value of /completion endpoint (ggerganov#3632) save-load-state : fix example + add ci test (ggerganov#3655) readme : add Aquila2 links (ggerganov#3610) tokenizer : special token handling (ggerganov#3538) k-quants : fix quantization ranges (ggerganov#3646) llava : fix tokenization to not add bos between image embeddings and user prompt (ggerganov#3645) MPT : support GQA for replit-code-v1.5 (ggerganov#3627) Honor -ngl option for Cuda offloading in llava (ggerganov#3621)

fix embeddings when using CUDA

1c9c215

ggerganov approved these changes Oct 17, 2023

View reviewed changes

ggerganov mentioned this pull request Oct 17, 2023

Bug: Invalid Embeddings if GPU offloaded (CUDA) #3625

Closed

slaren merged commit cb33f43 into master Oct 17, 2023
33 checks passed

slaren deleted the fix-cuda-embeddings branch October 17, 2023 20:24

ggerganov mentioned this pull request Oct 17, 2023

server : parallel decoding and multimodal #3589

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix embeddings when using CUDA #3657

fix embeddings when using CUDA #3657

slaren commented Oct 17, 2023

fix embeddings when using CUDA #3657

fix embeddings when using CUDA #3657

Conversation

slaren commented Oct 17, 2023