-
Notifications
You must be signed in to change notification settings - Fork 14k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Name and Version
./build/bin/llama-cli --version
version: 4232 (6acce39)
built with cc (GCC) 14.2.1 20240910 for x86_64-pc-linux-gnu
Operating systems
No response
Which llama.cpp modules do you know to be affected?
libllama (core library)
Problem description & steps to reproduce
The parameters for quantization are defined as follows:
// model quantization parameters
typedef struct llama_model_quantize_params {
int32_t nthread; // number of threads to use for quantizing, if <=0 will use std::thread::hardware_concurrency()
enum llama_ftype ftype; // quantize to this llama_ftype
enum ggml_type output_tensor_type; // output tensor type
enum ggml_type token_embedding_type; // token embeddings tensor type
bool allow_requantize; // allow quantizing non-f32/f16 tensors
bool quantize_output_tensor; // quantize output.weight
bool only_copy; // only copy tensors - ftype, allow_requantize and quantize_output_tensor are ignored
bool pure; // quantize all tensors to the default type
bool keep_split; // quantize to the same number of shards
void * imatrix; // pointer to importance matrix data
void * kv_overrides; // pointer to vector containing overrides
} llama_model_quantize_params;imatrix and kv_overrides are passed as pointers to a C++ map or vector respectively. This makes it impossible to use the corresponding functionality via C.
First Bad Commit
No response
Relevant log output
No response
mtasic85
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working