[V1] guidance backend for structured output + `auto` fallback mode #14779

russellb · 2025-03-13T19:49:35Z

This is the V1 integration for
guidance as a backend for
structured output. There is a V0 integration in #14589.

Since this is the second backend supported by V1, this PR also includes a new backend mode called auto, which provides opinionated behavior with fallback. This is not the default since the behavior may vary release to release.

This backend provides some key benefits to V1:

Broader jsonschema support
Quick startup performance for large schemas

Instead of precomputing the masks for all states, this is done on
the fly. We see very fast request startup times, even for large
schemas.

This should make V1 roughly feature equivalent to V0 in terms of the
types of schemas it can support.

More technical details are available in the llguidance git repo.

Signed-off-by: Russell Bryant rbryant@redhat.com
Co-authored-by: Loc Huynh jc1da.3011@gmail.com

github-actions · 2025-03-13T19:49:44Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

WoosukKwon · 2025-03-15T03:51:01Z

@russellb I removed this from the v0.8.0 blockers

mergify · 2025-03-15T03:51:14Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @russellb.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mmoskal · 2025-03-18T21:09:51Z

llguidance 0.7.1 now has rollback() and fast-forward() APIs similar to xgrammar; it also uses new simpler LLMatcher interface instead of LLInterpreter; will try to adapt this PR later today

Related v0 updates:

russellb/vllm@llguidance-v0-integration...mmoskal:llg_v0_matcher

russellb · 2025-03-18T21:11:26Z

llguidance 0.7.1 now has rollback() and fast-forward() APIs similar to xgrammar; it also uses new simpler LLMatcher interface instead of LLInterpreter; will try to adapt this PR later today

Related v0 updates:

russellb/vllm@llguidance-v0-integration...mmoskal:llg_v0_matcher

thanks!

I'm also turning on test coverage. I forgot to do that earlier.

russellb · 2025-03-18T22:44:06Z

Tests are all green locally. I also included a new auto mode that's off by default, but it's where we can try to be opinionated and choose a backend based on current support in various libraries.

mergify · 2025-03-20T16:08:34Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @russellb.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

russellb · 2025-03-22T14:29:04Z

Great feedback. I agree we should improve error handling and propagation.

russellb · 2025-03-22T19:22:04Z

@mmoskal I just realized we're using xgrammar for applying the bitmask since we didn't touch the GPU worker code in this PR. It's obviously working, but seems like we should use llguidance for consistency, even if it's just in a follow-up PR.

mmoskal · 2025-03-22T19:36:50Z

It’s exactly the same mask format. Given that llg and xgr come up with it independently, maybe we just call it universal?

mmoskal · 2025-03-22T19:38:36Z

This way you could on future apply it directly in softmax kernel.

aarnphm

Let's get this in. LGTM

edit: oh i didn't see the CI failure. We must fix that.

This is the V1 integration for [guidance](https://github.com/guidance-ai/llguidance) as a backend for structured output. There is a V0 integration in vllm-project#14589. This backend provides some key benefits to V1: * Broader jsonschema support * Quick startup performance for large schemas Instead of precomputing the masks for all states, this is done on the fly. We see very fast request startup times, even for large schemas. This should make V1 roughly feature equivalent to V0 in terms of the types of schemas it can support. An `auto` mode is also included, which includes opinionated fallback behavior based on our current understanding for varying feature support and performance characteristics for different scenarios. More technical details are available in the llguidance git repo. Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <jc1da.3011@gmail.com> Co-authored-by: Michal Moskal <michal@moskal.me>

Signed-off-by: Russell Bryant <rbryant@redhat.com>

russellb · 2025-03-24T22:17:08Z

The last CI failure was actually a bug with xgrammar, but we need an xgrammar release to fix it properly. I put in a change in the tests to make it a bit less strict until we can fix the issue.

This is a bug, but we can't properly fix it until the next xgrammar release, so just let the test be a little bit more flexible for now. Signed-off-by: Russell Bryant <rbryant@redhat.com>

WoosukKwon

@aarnphm Could you please take a look?

simon-mo · 2025-03-25T04:02:40Z

merging for release

…llm-project#14779) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <jc1da.3011@gmail.com> Co-authored-by: Michal Moskal <michal@moskal.me>

…llm-project#14779) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <jc1da.3011@gmail.com> Co-authored-by: Michal Moskal <michal@moskal.me> Signed-off-by: Wes Medford <wryanmedford@gmail.com>

…llm-project#14779) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <jc1da.3011@gmail.com> Co-authored-by: Michal Moskal <michal@moskal.me>

…llm-project#14779) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <jc1da.3011@gmail.com> Co-authored-by: Michal Moskal <michal@moskal.me> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

…llm-project#14779) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <jc1da.3011@gmail.com> Co-authored-by: Michal Moskal <michal@moskal.me>

…llm-project#14779) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Loc Huynh <jc1da.3011@gmail.com> Co-authored-by: Michal Moskal <michal@moskal.me> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

russellb added this to the v0.8.0 milestone Mar 13, 2025

mergify bot added ci/build v1 labels Mar 13, 2025

russellb force-pushed the llguidance-v1-integration branch from 6df2b15 to 870bf05 Compare March 13, 2025 20:07

russellb mentioned this pull request Mar 14, 2025

[V1] Refactor Structured Output for multiple backends #14694

Merged

WoosukKwon removed this from the v0.8.0 milestone Mar 15, 2025

mergify bot added the needs-rebase label Mar 15, 2025

russellb force-pushed the llguidance-v1-integration branch from 7a993cf to e22f407 Compare March 18, 2025 15:16

mergify bot removed the needs-rebase label Mar 18, 2025

russellb force-pushed the llguidance-v1-integration branch from e22f407 to 122da1c Compare March 18, 2025 20:18

russellb marked this pull request as ready for review March 18, 2025 20:22

russellb requested review from mgoin, WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners March 18, 2025 20:22

russellb marked this pull request as draft March 18, 2025 21:16

russellb added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 18, 2025

russellb changed the title ~~[V1] guidance backend for structured output~~ [V1] guidance backend for structured output + auto fallback mode Mar 18, 2025

russellb marked this pull request as ready for review March 18, 2025 22:43

russellb force-pushed the llguidance-v1-integration branch from 469d4ec to 4bd53a9 Compare March 20, 2025 16:07

russellb force-pushed the llguidance-v1-integration branch from 612c841 to a94159c Compare March 22, 2025 19:29

aarnphm approved these changes Mar 23, 2025

View reviewed changes

aarnphm self-requested a review March 23, 2025 03:38

russellb force-pushed the llguidance-v1-integration branch from a94159c to 58a6a04 Compare March 24, 2025 19:24

Remove duplicate llguidance entry in requirements

baa8a79

Signed-off-by: Russell Bryant <rbryant@redhat.com>

WoosukKwon added this to the v0.8.2 milestone Mar 24, 2025

russellb force-pushed the llguidance-v1-integration branch 2 times, most recently from bcf0097 to 7f27fea Compare March 24, 2025 23:01

Allow list or dict with json_object for xgrammar

7ac4538

This is a bug, but we can't properly fix it until the next xgrammar release, so just let the test be a little bit more flexible for now. Signed-off-by: Russell Bryant <rbryant@redhat.com>

russellb force-pushed the llguidance-v1-integration branch from 7f27fea to 7ac4538 Compare March 24, 2025 23:05

WoosukKwon reviewed Mar 24, 2025

View reviewed changes

simon-mo merged commit a09ad90 into vllm-project:main Mar 25, 2025
59 checks passed

Chenyaaang mentioned this pull request Mar 26, 2025

add platform check back #15578

Merged

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

ArthurBook mentioned this pull request May 9, 2025

[data][llm] fix: remove-no-longer needed guided decoding vllm v0 constraint ray-project/ray#52903

Merged

8 tasks

Uh oh!

[V1] guidance backend for structured output + auto fallback mode #14779

[V1] guidance backend for structured output + auto fallback mode #14779

Uh oh!

Conversation

russellb commented Mar 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 13, 2025

Uh oh!

WoosukKwon commented Mar 15, 2025

Uh oh!

mergify bot commented Mar 15, 2025

Uh oh!

mmoskal commented Mar 18, 2025

Uh oh!

russellb commented Mar 18, 2025

Uh oh!

russellb commented Mar 18, 2025

Uh oh!

mergify bot commented Mar 20, 2025

Uh oh!

russellb commented Mar 22, 2025

Uh oh!

russellb commented Mar 22, 2025

Uh oh!

mmoskal commented Mar 22, 2025

Uh oh!

mmoskal commented Mar 22, 2025

Uh oh!

aarnphm left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

russellb commented Mar 24, 2025

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

simon-mo commented Mar 25, 2025

Uh oh!

Uh oh!

[V1] guidance backend for structured output + `auto` fallback mode #14779

[V1] guidance backend for structured output + `auto` fallback mode #14779

russellb commented Mar 13, 2025 •

edited by github-actions bot

Loading

aarnphm left a comment •

edited

Loading