Skip to content

SaveState / LoadState not working on 8-bit quantized gguf models #260

Closed
@BrainSlugs83

Description

@BrainSlugs83

Not sure if it's working for other model types, I'm only testing on 8-bit models right now, so it might be a wider bug. (Specifically this happens for me with openchat_3.5.Q8_0.gguf).

I'm using the following parameters:

var parameters = new ModelParams(@"C:\models\openchat_3.5.Q8_0.gguf")
{
    ContextSize = 8 * 1024,
    Seed = 1337,
    GpuLayerCount = 15
};

Calling InteractiveExecutor.SaveState produces a json file with the correct tokens (you can pass them to the tokenizer to see them), among other values.
And then calling InteractiveExecutor.LoadState on a new instance just causes it to spit out random garbled text that is not even coherent sentences.

Same problem happens with GetStateData() and LoadState as well.

Btw, I'm using LLamaSharp 0.51 and Cuda11 backend.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    ✅ Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions