Skip to content

Improve query formatting & sizeing #23

@jepler

Description

@jepler

huggingface transformers 4.34, which is quite new, has support for "chat templates" and can also tell you the size of a chat in tokens.

however, a lot of models don't have the required chat templates (yet?) and getting chat templates for some models (e.g., llama2) requires special permission even if a derived quantized model was not behind a signup wall.

Use this tech, or something like it, to replace the hard-coded query formatting of the llama_cpp backend and to improve the length of the query itself instead of having the hard-coded limit of 5 messages.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions