Skip to content

Structured Generation: Implement LLGuidance from Upstream #459

@iwr-redmond

Description

@iwr-redmond

Support for LLGuidance, which uses constrained sampling to facilitate valid JSON output, was added to llama.cpp and then enhanced earlier this year. It's the difference between asking "pretty please", validating the output post-generation, and guaranteeing valid output by supervising each token as it is generated, and it makes working with Small Language Models much more reliable.

Enabling this feature during compilation requires some fiddling with Rust, but is probably the most effective implementation possible given the move away from the llama-cpp-python backend (see #370 for history).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions