Skip to content

quantize to F32/F16/Q8_0 can result in a Q6_K output tensor #5818

Closed
@cebtenzzre

Description

@cebtenzzre

Running quantize with a target dtype of F32, F16, or Q8_0 can result in a Q6_K output tensor without --pure (ref #5631 (comment)). This is surprising, as I would expect converting to F32 and then quantizing to F16 to produce similar results to converting directly to F16.

I suggest that the k-quant mixture logic should never attempt to decrease the quality of the output tensor, only increase it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggood first issueGood for newcomers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions