Skip to content

Misc. bug: interface for model quantization is not fully C-compatible #10614

Open
@JohannesGaessler

Description

@JohannesGaessler

Name and Version

./build/bin/llama-cli --version
version: 4232 (6acce39)
built with cc (GCC) 14.2.1 20240910 for x86_64-pc-linux-gnu

Operating systems

No response

Which llama.cpp modules do you know to be affected?

libllama (core library)

Problem description & steps to reproduce

The parameters for quantization are defined as follows:

    // model quantization parameters
    typedef struct llama_model_quantize_params {
        int32_t nthread;                     // number of threads to use for quantizing, if <=0 will use std::thread::hardware_concurrency()
        enum llama_ftype ftype;              // quantize to this llama_ftype
        enum ggml_type output_tensor_type;   // output tensor type
        enum ggml_type token_embedding_type; // token embeddings tensor type
        bool allow_requantize;               // allow quantizing non-f32/f16 tensors
        bool quantize_output_tensor;         // quantize output.weight
        bool only_copy;                      // only copy tensors - ftype, allow_requantize and quantize_output_tensor are ignored
        bool pure;                           // quantize all tensors to the default type
        bool keep_split;                     // quantize to the same number of shards
        void * imatrix;                      // pointer to importance matrix data
        void * kv_overrides;                 // pointer to vector containing overrides
    } llama_model_quantize_params;

imatrix and kv_overrides are passed as pointers to a C++ map or vector respectively. This makes it impossible to use the corresponding functionality via C.

First Bad Commit

No response

Relevant log output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions