This repo defines an interface for integration of structured output (constrained decoding) engines into LLM inference systems. It presents layout of an object provided by a structured outputs engine, and then passed to the inference system.
The core of the interface is a C struct cbison_factory
that contains
(in addition to version and magic numbers, etc.) the following function pointers:
validate_grammar
, taking type and text of a grammar, and returning a boolean and diagnosticsnew_matcher
, also taking type and text of a grammar, and returning a pointer to a matcher object
Methods on matcher objects are also defined as function pointers in cbison_factory
and include:
- state accessors:
get_error
,is_accepting
,is_stopped
compute_mask
, which returns a bitmask corresponding to allowed tokens in the current state of the matcherconsume_tokens
advancing the state of the matcher
Following matcher methods are optional:
validate_tokens
checking if (one or more) tokens would be accepted in sequencecompute_ff_tokens
returning any fast-forward tokens forced by the matcherrollback
which is the inverse ofconsume_tokens
reset
which resets the matcher to the initial state
Additionally, the factory has an optional method compute_masks
which
returns token bitmasks for several matchers in parallel.
The C++ cbison::Factory
class wraps an existing cbison_factory
and provides a C++ interface.
The Python class cbison.CbisonFactory
uses ctypes
to wrap the C interface.
A grammar engine constructs cbison_factory
given a cbison_tokenizer
,
which defines the following methods:
get_token
, which given a numeric token ID returns corresponding sequence of bytesis_special_token
, which given a token ID returns true if the token is special (eg.,<|endoftext|>
,<think>
, etc.)tokenize_bytes
, which takes a sequence of bytes and returns a list of token IDs (this is required to correctly compute "fast-forward" tokens based on "fast-forward" bytes)
The C++ cbison::Tokenizer
class wraps an existing cbison_tokenizer
and provides a C++ interface.
The Python class cbison.CbisonTokenizer
uses ctypes
to wrap the C interface.
Separately, cbison::CppTokenizer
makes it easier to implement a cbison_tokenizer
in C++.