You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Support for LLGuidance, which uses constrained sampling to facilitate valid JSON output, was added to llama.cpp and then enhanced earlier this year. It's the difference between asking "pretty please", validating the output post-generation, and guaranteeing valid output by supervising each token as it is generated, and it makes working with Small Language Models much more reliable.
Enabling this feature during compilation requires some fiddling with Rust, but is probably the most effective implementation possible given the move away from the llama-cpp-python backend (see #370 for history).