Skip to content

Commit

Permalink
Complete removal or f16_kv, add offload_kqv field
Browse files Browse the repository at this point in the history
This addresses two issues:

 - abetlen#995 which just requests to add the KV cache offloading param
 - abetlen#1006 a NULL ptr exception when using the embeddings (introduced by
   leaving f16_kv in the fields struct)
  • Loading branch information
brandonrobertz committed Dec 17, 2023
1 parent 37da8e8 commit 9cdfe93
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions llama_cpp/llama_cpp.py
Original file line number Diff line number Diff line change
Expand Up @@ -432,9 +432,9 @@ class llama_context_params(Structure):
type_k (int): data type for K cache
type_v (int): data type for V cache
mul_mat_q (bool): if true, use experimental mul_mat_q kernels (DEPRECATED - always true)
f16_kv (bool): use fp16 for KV cache, fp32 otherwise
logits_all (bool): the llama_eval() call computes all logits, not just the last one (DEPRECATED - set llama_batch.logits instead)
embedding (bool): embedding mode only"""
embedding (bool): embedding mode only
offload_kqv (bool): whether to offload the KQV ops (including the KV cache) to GPU"""
_fields_ = [
("seed", c_uint32),
("n_ctx", c_uint32),
Expand All @@ -452,9 +452,9 @@ class llama_context_params(Structure):
("type_k", c_int),
("type_v", c_int),
("mul_mat_q", c_bool),
("f16_kv", c_bool),
("logits_all", c_bool),
("embedding", c_bool),
("offload_kqv", c_bool),
]


Expand Down

0 comments on commit 9cdfe93

Please sign in to comment.