-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
[V1] guidance backend for structured output + auto
fallback mode
#14779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V1] guidance backend for structured output + auto
fallback mode
#14779
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
6df2b15
to
870bf05
Compare
@russellb I removed this from the v0.8.0 blockers |
This pull request has merge conflicts that must be resolved before it can be |
7a993cf
to
e22f407
Compare
e22f407
to
122da1c
Compare
llguidance 0.7.1 now has rollback() and fast-forward() APIs similar to xgrammar; it also uses new simpler LLMatcher interface instead of LLInterpreter; will try to adapt this PR later today Related v0 updates: russellb/vllm@llguidance-v0-integration...mmoskal:llg_v0_matcher |
thanks! I'm also turning on test coverage. I forgot to do that earlier. |
auto
fallback mode
Tests are all green locally. I also included a new |
469d4ec
to
4bd53a9
Compare
This pull request has merge conflicts that must be resolved before it can be |
Great feedback. I agree we should improve error handling and propagation. |
@mmoskal I just realized we're using xgrammar for applying the bitmask since we didn't touch the GPU worker code in this PR. It's obviously working, but seems like we should use llguidance for consistency, even if it's just in a follow-up PR. |
612c841
to
a94159c
Compare
It’s exactly the same mask format. Given that llg and xgr come up with it independently, maybe we just call it universal? |
This way you could on future apply it directly in softmax kernel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get this in. LGTM
edit: oh i didn't see the CI failure. We must fix that.
This is the V1 integration for [guidance](https://github.com/guidance-ai/llguidance) as a backend for structured output. There is a V0 integration in vllm-project#14589. This backend provides some key benefits to V1: * Broader jsonschema support * Quick startup performance for large schemas Instead of precomputing the masks for all states, this is done on the fly. We see very fast request startup times, even for large schemas. This should make V1 roughly feature equivalent to V0 in terms of the types of schemas it can support. An `auto` mode is also included, which includes opinionated fallback behavior based on our current understanding for varying feature support and performance characteristics for different scenarios. More technical details are available in the llguidance git repo. Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <jc1da.3011@gmail.com> Co-authored-by: Michal Moskal <michal@moskal.me>
a94159c
to
58a6a04
Compare
Signed-off-by: Russell Bryant <rbryant@redhat.com>
The last CI failure was actually a bug with xgrammar, but we need an xgrammar release to fix it properly. I put in a change in the tests to make it a bit less strict until we can fix the issue. |
bcf0097
to
7f27fea
Compare
This is a bug, but we can't properly fix it until the next xgrammar release, so just let the test be a little bit more flexible for now. Signed-off-by: Russell Bryant <rbryant@redhat.com>
7f27fea
to
7ac4538
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aarnphm Could you please take a look?
merging for release |
…llm-project#14779) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <jc1da.3011@gmail.com> Co-authored-by: Michal Moskal <michal@moskal.me>
…llm-project#14779) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <jc1da.3011@gmail.com> Co-authored-by: Michal Moskal <michal@moskal.me> Signed-off-by: Wes Medford <wryanmedford@gmail.com>
…llm-project#14779) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <jc1da.3011@gmail.com> Co-authored-by: Michal Moskal <michal@moskal.me>
…llm-project#14779) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <jc1da.3011@gmail.com> Co-authored-by: Michal Moskal <michal@moskal.me> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
…llm-project#14779) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <jc1da.3011@gmail.com> Co-authored-by: Michal Moskal <michal@moskal.me>
…llm-project#14779) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <jc1da.3011@gmail.com> Co-authored-by: Michal Moskal <michal@moskal.me>
…llm-project#14779) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <jc1da.3011@gmail.com> Co-authored-by: Michal Moskal <michal@moskal.me>
…llm-project#14779) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <jc1da.3011@gmail.com> Co-authored-by: Michal Moskal <michal@moskal.me> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
This is the V1 integration for
guidance as a backend for
structured output. There is a V0 integration in #14589.
Since this is the second backend supported by V1, this PR also includes a new backend mode called
auto
, which provides opinionated behavior with fallback. This is not the default since the behavior may vary release to release.This backend provides some key benefits to V1:
Instead of precomputing the masks for all states, this is done on
the fly. We see very fast request startup times, even for large
schemas.
This should make V1 roughly feature equivalent to V0 in terms of the
types of schemas it can support.
More technical details are available in the llguidance git repo.
Signed-off-by: Russell Bryant rbryant@redhat.com
Co-authored-by: Loc Huynh jc1da.3011@gmail.com