SaveState / LoadState not working on 8-bit quantized gguf models

Not sure if it's working for other model types, I'm only testing on 8-bit models right now, so it might be a wider bug.  (Specifically this happens for me with `openchat_3.5.Q8_0.gguf`).

I'm using the following parameters:
```C#
var parameters = new ModelParams(@"C:\models\openchat_3.5.Q8_0.gguf")
{
    ContextSize = 8 * 1024,
    Seed = 1337,
    GpuLayerCount = 15
};
```

Calling `InteractiveExecutor.SaveState` produces a json file with the correct tokens (you can pass them to the tokenizer to see them), among other values.
And then calling `InteractiveExecutor.LoadState` on a new instance just causes it to spit out random garbled text that is not even coherent sentences.

Same problem happens with `GetStateData()` and `LoadState` as well.

Btw, I'm using LLamaSharp 0.51 and Cuda11 backend.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SaveState / LoadState not working on 8-bit quantized gguf models #260

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SaveState / LoadState not working on 8-bit quantized gguf models #260

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions