Open
Description
Name and Version
./build/bin/llama-cli --version
version: 4232 (6acce39)
built with cc (GCC) 14.2.1 20240910 for x86_64-pc-linux-gnu
Operating systems
No response
Which llama.cpp modules do you know to be affected?
libllama (core library)
Problem description & steps to reproduce
The parameters for quantization are defined as follows:
// model quantization parameters
typedef struct llama_model_quantize_params {
int32_t nthread; // number of threads to use for quantizing, if <=0 will use std::thread::hardware_concurrency()
enum llama_ftype ftype; // quantize to this llama_ftype
enum ggml_type output_tensor_type; // output tensor type
enum ggml_type token_embedding_type; // token embeddings tensor type
bool allow_requantize; // allow quantizing non-f32/f16 tensors
bool quantize_output_tensor; // quantize output.weight
bool only_copy; // only copy tensors - ftype, allow_requantize and quantize_output_tensor are ignored
bool pure; // quantize all tensors to the default type
bool keep_split; // quantize to the same number of shards
void * imatrix; // pointer to importance matrix data
void * kv_overrides; // pointer to vector containing overrides
} llama_model_quantize_params;
imatrix
and kv_overrides
are passed as pointers to a C++ map or vector respectively. This makes it impossible to use the corresponding functionality via C.
First Bad Commit
No response
Relevant log output
No response