Structured Generation: Implement LLGuidance from Upstream

Support for [LLGuidance](https://github.com/guidance-ai/llguidance), which uses constrained sampling to facilitate valid JSON output, was [added](https://github.com/ggerganov/llama.cpp/pull/10224) to llama.cpp and then [enhanced](https://github.com/ggml-org/llama.cpp/pull/11664) earlier this year. It's the difference between asking "pretty please", validating the output post-generation, and guaranteeing valid output by supervising each token as it is generated, and it makes working with Small Language Models much more reliable.

Enabling this feature during compilation [requires some fiddling](https://github.com/ggerganov/llama.cpp/blob/master/docs/llguidance.md) with Rust, but is probably the most effective implementation possible given the move away from the llama-cpp-python backend (see #370 for history).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Structured Generation: Implement LLGuidance from Upstream #459

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Structured Generation: Implement LLGuidance from Upstream #459

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions