Skip to content

[V1] guidance backend for structured output + auto fallback mode #14779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 25, 2025

Conversation

russellb
Copy link
Member

@russellb russellb commented Mar 13, 2025

This is the V1 integration for
guidance as a backend for
structured output. There is a V0 integration in #14589.

Since this is the second backend supported by V1, this PR also includes a new backend mode called auto, which provides opinionated behavior with fallback. This is not the default since the behavior may vary release to release.

This backend provides some key benefits to V1:

  • Broader jsonschema support
  • Quick startup performance for large schemas

Instead of precomputing the masks for all states, this is done on
the fly. We see very fast request startup times, even for large
schemas.

This should make V1 roughly feature equivalent to V0 in terms of the
types of schemas it can support.

More technical details are available in the llguidance git repo.

Signed-off-by: Russell Bryant rbryant@redhat.com
Co-authored-by: Loc Huynh jc1da.3011@gmail.com

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@russellb russellb added this to the v0.8.0 milestone Mar 13, 2025
@russellb russellb force-pushed the llguidance-v1-integration branch from 6df2b15 to 870bf05 Compare March 13, 2025 20:07
@WoosukKwon WoosukKwon removed this from the v0.8.0 milestone Mar 15, 2025
@WoosukKwon
Copy link
Collaborator

@russellb I removed this from the v0.8.0 blockers

Copy link

mergify bot commented Mar 15, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @russellb.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 15, 2025
@russellb russellb force-pushed the llguidance-v1-integration branch from 7a993cf to e22f407 Compare March 18, 2025 15:16
@mergify mergify bot removed the needs-rebase label Mar 18, 2025
@russellb russellb force-pushed the llguidance-v1-integration branch from e22f407 to 122da1c Compare March 18, 2025 20:18
@russellb russellb marked this pull request as ready for review March 18, 2025 20:22
@mmoskal
Copy link
Contributor

mmoskal commented Mar 18, 2025

llguidance 0.7.1 now has rollback() and fast-forward() APIs similar to xgrammar; it also uses new simpler LLMatcher interface instead of LLInterpreter; will try to adapt this PR later today

Related v0 updates:

russellb/vllm@llguidance-v0-integration...mmoskal:llg_v0_matcher

@russellb
Copy link
Member Author

llguidance 0.7.1 now has rollback() and fast-forward() APIs similar to xgrammar; it also uses new simpler LLMatcher interface instead of LLInterpreter; will try to adapt this PR later today

Related v0 updates:

russellb/vllm@llguidance-v0-integration...mmoskal:llg_v0_matcher

thanks!

I'm also turning on test coverage. I forgot to do that earlier.

@russellb russellb marked this pull request as draft March 18, 2025 21:16
@russellb russellb added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 18, 2025
@russellb russellb changed the title [V1] guidance backend for structured output [V1] guidance backend for structured output + auto fallback mode Mar 18, 2025
@russellb russellb marked this pull request as ready for review March 18, 2025 22:43
@russellb
Copy link
Member Author

Tests are all green locally. I also included a new auto mode that's off by default, but it's where we can try to be opinionated and choose a backend based on current support in various libraries.

@russellb russellb force-pushed the llguidance-v1-integration branch from 469d4ec to 4bd53a9 Compare March 20, 2025 16:07
Copy link

mergify bot commented Mar 20, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @russellb.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@russellb
Copy link
Member Author

Great feedback. I agree we should improve error handling and propagation.

@russellb
Copy link
Member Author

@mmoskal I just realized we're using xgrammar for applying the bitmask since we didn't touch the GPU worker code in this PR. It's obviously working, but seems like we should use llguidance for consistency, even if it's just in a follow-up PR.

@russellb russellb force-pushed the llguidance-v1-integration branch from 612c841 to a94159c Compare March 22, 2025 19:29
@mmoskal
Copy link
Contributor

mmoskal commented Mar 22, 2025

It’s exactly the same mask format. Given that llg and xgr come up with it independently, maybe we just call it universal?

@mmoskal
Copy link
Contributor

mmoskal commented Mar 22, 2025

This way you could on future apply it directly in softmax kernel.

Copy link
Collaborator

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this in. LGTM

edit: oh i didn't see the CI failure. We must fix that.

@aarnphm aarnphm self-requested a review March 23, 2025 03:38
This is the V1 integration for
[guidance](https://github.com/guidance-ai/llguidance) as a backend for
structured output. There is a V0 integration in vllm-project#14589.

This backend provides some key benefits to V1:

* Broader jsonschema support
* Quick startup performance for large schemas

Instead of precomputing the masks for all states, this is done on
the fly. We see very fast request startup times, even for large
schemas.

This should make V1 roughly feature equivalent to V0 in terms of the
types of schemas it can support.

An `auto` mode is also included, which includes opinionated fallback
behavior based on our current understanding for varying feature support
and performance characteristics for different scenarios.

More technical details are available in the llguidance git repo.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <jc1da.3011@gmail.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
@russellb russellb force-pushed the llguidance-v1-integration branch from a94159c to 58a6a04 Compare March 24, 2025 19:24
Signed-off-by: Russell Bryant <rbryant@redhat.com>
@russellb
Copy link
Member Author

The last CI failure was actually a bug with xgrammar, but we need an xgrammar release to fix it properly. I put in a change in the tests to make it a bit less strict until we can fix the issue.

@WoosukKwon WoosukKwon added this to the v0.8.2 milestone Mar 24, 2025
@russellb russellb force-pushed the llguidance-v1-integration branch 2 times, most recently from bcf0097 to 7f27fea Compare March 24, 2025 23:01
This is a bug, but we can't properly fix it until the next xgrammar
release, so just let the test be a little bit more flexible for now.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
@russellb russellb force-pushed the llguidance-v1-integration branch from 7f27fea to 7ac4538 Compare March 24, 2025 23:05
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarnphm Could you please take a look?

@simon-mo simon-mo merged commit a09ad90 into vllm-project:main Mar 25, 2025
59 checks passed
@simon-mo
Copy link
Collaborator

merging for release

erictang000 pushed a commit to erictang000/vllm that referenced this pull request Mar 25, 2025
…llm-project#14779)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <jc1da.3011@gmail.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
wrmedford pushed a commit to wrmedford/vllm that referenced this pull request Mar 26, 2025
…llm-project#14779)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <jc1da.3011@gmail.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
Signed-off-by: Wes Medford <wryanmedford@gmail.com>
lengrongfu pushed a commit to lengrongfu/vllm that referenced this pull request Apr 2, 2025
…llm-project#14779)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <jc1da.3011@gmail.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
…llm-project#14779)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <jc1da.3011@gmail.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
nishith-fujitsu pushed a commit to nishith-fujitsu/vllm that referenced this pull request Apr 9, 2025
…llm-project#14779)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <jc1da.3011@gmail.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025
…llm-project#14779)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <jc1da.3011@gmail.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025
…llm-project#14779)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <jc1da.3011@gmail.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
…llm-project#14779)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <jc1da.3011@gmail.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants