Update to llama.cpp 2026-01-01 #2108
Open
+164
−259
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Bindings were 5 months outdated, preventing newer model architectures from loading.
Updates bindings to llama.cpp commit be47fb92 (2026-01-01).
Removed
llama_kv_self_*functions (usellama_memory_*API)llama_sampler_init_softmax()Added
Enums:
LLAMA_ROPE_TYPE_IMROPEllama_flash_attn_typellama_params_fit_statusllama_model_meta_keyStruct fields:
llama_model_params:no_host,no_allocllama_context_params:flash_attn_type(replacedflash_attnbool)Functions:
llama_max_tensor_buft_overrides,llama_n_ctx_seq,llama_model_n_embd_inp,llama_model_is_hybrid,llama_flash_attn_type_name,llama_model_meta_key_str,llama_adapter_meta_*(5 functions),llama_log_get,llama_log_set,llama_memory_breakdown_printBreaking Changes
flash_attn parameter:
KV cache API:
Other
ggml_log_callbacktypedefLLAMA_INSTALL_VERSIONbefore subdirectory include)Tested: macOS ARM64 Metal, Python 3.14, Nemotron-3-Nano-30B