[FIXBUG] Correctly Apply Grammar Bitmask in Mixed Batches #22896

JartX · 2025-08-14T10:19:16Z

This issue, introduced by a change in PR: #21862

When processing a batch that contains a mix of requests—some using guided generation (e.g., guided_json) and others being standard, non-guided requests (e.g., chat completions)—the non-guided requests fail.

The typical symptom is that the output for the non-guided request consists of a stream of repetitive characters, such as exclamation marks (!!!!!!!!!!), indicating that its vocabulary has been incorrectly masked. This issue only occurs when both types of requests are present in the same batch; batches containing only one type of request work as expected.

Root Cause Analysis

The bug is located in the apply_grammar_bitmask method within vllm/v1/worker/gpu_model_runner.py. The logical flow that leads to the error is as follows:

When a batch includes at least one guided request, the scheduler produces a grammar_bitmask numpy array. This array is compact and only contains masks for the guided requests.

Inside apply_grammar_bitmask, a new bitmask tensor, sorted_bitmask, is created to match the full size of the batch logits (i.e., one row for every request in the batch).

The error occurs here: This sorted_bitmask is initialized with zeros using np.zeros_like. In the bitmasking scheme used by xgrammar, a value of 0 instructs the system to disallow a token, whereas -1 is the value to allow all tokens.

The method then correctly copies the specific grammar masks from the scheduler's compact array into the appropriate rows of sorted_bitmask for the guided requests.

However, the rows corresponding to the non-guided requests are never updated, so they remain filled with zeros.

When this final sorted_bitmask is applied to the batch logits, it incorrectly forbids all vocabulary tokens for the non-guided requests, causing the model to produce invalid output.

Solution

The solution is to initialize the sorted_bitmask with the correct default value that allows all tokens. Instead of creating a tensor of zeros, we now create a tensor filled with -1.

gemini-code-assist

Code Review

This pull request correctly addresses a bug that caused non-guided requests to fail when processed in a mixed batch with guided generation requests. The root cause analysis in the description is excellent and accurately identifies that initializing the sorted_bitmask with zeros was incorrectly masking all tokens for non-guided requests. The proposed solution of using np.full to initialize the bitmask with -1 is the correct approach, as it properly allows all tokens for non-guided requests by default. The change is concise, well-targeted, and effectively resolves the issue.

github-actions · 2025-08-14T10:24:11Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mgoin

Nice find, this looks reasonable to me. Is there a unit test we could make to enforce this? cc @aarnphm @russellb @benchislett

russellb

good catch, thank you!

russellb · 2025-08-14T14:15:19Z

agree that test coverage would be nice, but the fix is important enough not to block on.

JartX · 2025-08-15T08:04:56Z

Hi @russellb @mgoin

I saw this pull request: https://github.com/vllm -project/vllm/pull/22963

I would say that this other approach would also have to fix the bug

russellb · 2025-08-15T12:22:39Z

Hi @russellb @mgoin

I saw this pull request: https://github.com/vllm -project/vllm/pull/22963

I would say that this other approach would also have to fix the bug

Thanks for pointing out #22963. I'd like to merge this change even if we merge the other one as well. It makes sense we should always initialize the mask to accept all.

russellb · 2025-08-15T12:24:06Z

@JartX it looks like your commit is missing the Signed-off-by header. Would you mind adding it?

Signed-off-by: JartX <sagformas@epdcenter.es>

JartX · 2025-08-15T13:40:29Z

@russellb done!

facebook-github-bot · 2025-08-15T16:50:04Z

@sarckk has imported this pull request. If you are a Meta employee, you can view this in D80351340.

…ct#22896) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: Yiwen Chen <yiwen66@berkeley.edu>

…ct#22896) Signed-off-by: JartX <sagformas@epdcenter.es>

…ct#22896) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: Duncan Moss <djm.moss@gmail.com>

…ct#22896) Signed-off-by: JartX <sagformas@epdcenter.es>

…ct#22896) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

…ct#22896) Signed-off-by: JartX <sagformas@epdcenter.es>

JartX requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners August 14, 2025 10:19

mergify bot added the v1 label Aug 14, 2025

gemini-code-assist bot reviewed Aug 14, 2025

View reviewed changes

mgoin approved these changes Aug 14, 2025

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 14, 2025

russellb requested a review from benchislett August 14, 2025 14:04

russellb approved these changes Aug 14, 2025

View reviewed changes

russellb enabled auto-merge (squash) August 14, 2025 14:14

Fix: Correctly Apply Grammar Bitmask in Mixed Batches

44bf28a

Signed-off-by: JartX <sagformas@epdcenter.es>

auto-merge was automatically disabled August 15, 2025 13:39
Head branch was pushed to by a user without write access

JartX force-pushed the fix/guided-generation-mixed-batch-by-pr-21862 branch from 07c1611 to 44bf28a Compare August 15, 2025 13:39

Merge branch 'main' into fix/guided-generation-mixed-batch-by-pr-21862

87c669e

russellb added this to the v0.10.1 milestone Aug 15, 2025

rishitdholakia13 mentioned this pull request Aug 15, 2025

[Structured Outputs] [Bug] Fix misalignment in apply_grammar_bitmask causing unintended masking and NaN logits #22963

Merged

4 tasks

Merge branch 'main' into fix/guided-generation-mixed-batch-by-pr-21862

88071d3

russellb enabled auto-merge (squash) August 15, 2025 15:09

russellb mentioned this pull request Aug 15, 2025

[Structured Output][Refactor] Move apply_grammar_bitmask() method from ModelRunner to structured output utils #21999

Merged

4 tasks

russellb merged commit 68af77e into vllm-project:main Aug 15, 2025
39 checks passed

666even666 pushed a commit to 666even666/vllm that referenced this pull request Aug 18, 2025

[FIXBUG] Correctly Apply Grammar Bitmask in Mixed Batches (vllm-proje…

cd426ee

…ct#22896) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: Yiwen Chen <yiwen66@berkeley.edu>

yiliu30 pushed a commit to yiliu30/vllm-fork that referenced this pull request Aug 19, 2025

[FIXBUG] Correctly Apply Grammar Bitmask in Mixed Batches (vllm-proje…

26f7c3c

…ct#22896) Signed-off-by: JartX <sagformas@epdcenter.es>

FWao mentioned this pull request Aug 19, 2025

[Bug]: Docker vLLM 0.9.1 CUDA error: an illegal memory access, sampled_token_ids.tolist() #19483

Closed

1 task

divakar-amd pushed a commit to divakar-amd/vllm_upstream that referenced this pull request Aug 20, 2025

[FIXBUG] Correctly Apply Grammar Bitmask in Mixed Batches (vllm-proje…

c4671db

…ct#22896) Signed-off-by: JartX <sagformas@epdcenter.es>

djmmoss pushed a commit to djmmoss/vllm that referenced this pull request Aug 21, 2025

[FIXBUG] Correctly Apply Grammar Bitmask in Mixed Batches (vllm-proje…

51b5895

…ct#22896) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: Duncan Moss <djm.moss@gmail.com>

sarckk mentioned this pull request Aug 22, 2025

Add unit tests for batched guided and non-guided requests #23389

Merged

4 tasks

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[FIXBUG] Correctly Apply Grammar Bitmask in Mixed Batches (vllm-proje…

c64fe8b

…ct#22896) Signed-off-by: JartX <sagformas@epdcenter.es>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

[FIXBUG] Correctly Apply Grammar Bitmask in Mixed Batches (vllm-proje…

439f227

…ct#22896) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

[FIXBUG] Correctly Apply Grammar Bitmask in Mixed Batches (vllm-proje…

4d86a85

…ct#22896) Signed-off-by: JartX <sagformas@epdcenter.es>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[FIXBUG] Correctly Apply Grammar Bitmask in Mixed Batches #22896

[FIXBUG] Correctly Apply Grammar Bitmask in Mixed Batches #22896

Uh oh!

JartX commented Aug 14, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Aug 14, 2025

Uh oh!

mgoin left a comment •

edited

Loading

Uh oh!

russellb left a comment

Uh oh!

russellb commented Aug 14, 2025

Uh oh!

JartX commented Aug 15, 2025

Uh oh!

russellb commented Aug 15, 2025

Uh oh!

russellb commented Aug 15, 2025

Uh oh!

JartX commented Aug 15, 2025

Uh oh!

facebook-github-bot commented Aug 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[FIXBUG] Correctly Apply Grammar Bitmask in Mixed Batches #22896

[FIXBUG] Correctly Apply Grammar Bitmask in Mixed Batches #22896

Uh oh!

Conversation

JartX commented Aug 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

github-actions bot commented Aug 14, 2025

Uh oh!

mgoin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

russellb left a comment

Choose a reason for hiding this comment

Uh oh!

russellb commented Aug 14, 2025

Uh oh!

JartX commented Aug 15, 2025

Uh oh!

russellb commented Aug 15, 2025

Uh oh!

russellb commented Aug 15, 2025

Uh oh!

JartX commented Aug 15, 2025

Uh oh!

facebook-github-bot commented Aug 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JartX commented Aug 14, 2025 •

edited by github-actions bot

Loading

mgoin left a comment •

edited

Loading