Description
🚀 Feature
Pass in a grammar or JSON schema to restrict the output of generated tokens. This would make data extraction and potentially tool-usage use cases simpler to implement.
Motivation
Having the ability to constrain responses to a specified grammar or a JSON schema would unlock data extraction and function calling use cases.
Alternatives
Prompt-engineering isn't sufficient and not transferrable between models.
Fine-tuning would be a much heavier lift compared a grammar that could drive output.
Additional context
Some prior art to consider:
ggml-org/llama.cpp#1773
https://huggingface.co/spaces/mishig/jsonformer
https://github.com/normal-computing/outlines
https://github.com/r2d4/rellm
Great project, I have been able to get highly performant and high quality response in a couple of hours of effort. Huge kudos to the MLC team and to Simon Willison for this project that got me started with his llm library: