Skip to content

AIFQA-393 BLK-007: [vLLM/lm_eval] ChatGLM2/3 tokenizer — empty stop string crash in lm_eval#3665

Open
jklawikowski wants to merge 1 commit intoEleutherAI:mainfrom
jklawikowski:dev/sys_qaplatformbot/AIFQA-393--20260330160725
Open

AIFQA-393 BLK-007: [vLLM/lm_eval] ChatGLM2/3 tokenizer — empty stop string crash in lm_eval#3665
jklawikowski wants to merge 1 commit intoEleutherAI:mainfrom
jklawikowski:dev/sys_qaplatformbot/AIFQA-393--20260330160725

Conversation

@jklawikowski
Copy link
Copy Markdown

Automated implementation for AIFQA-393.

Summary: BLK-007: [vLLM/lm_eval] ChatGLM2/3 tokenizer — empty stop string crash in lm_eval

Branch: dev/sys_qaplatformbot/AIFQA-393--20260330160725

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@jklawikowski
Copy link
Copy Markdown
Author

Reviewer -- Automated Code Review

Verdict: APPROVED ✅

Summary: This PR fixes a real bug where ChatGLM2/3 tokenizers decode their EOS token to an empty string, which gets passed as a stop sequence and causes downstream errors in sampling backends like vLLM. The fix correctly filters out empty/whitespace-only stop sequences in handle_stop_sequences() and includes comprehensive test coverage. The change is minimal, well-scoped, and the docstring clearly explains the motivation.

Stats: +62 -3 in 2 file(s)

Comments

  1. 🔵 [SUGGESTION] lm_eval/models/utils.py -- The filtering uses s.strip() which is truthy-based. This is fine for the current list[str] contract, but if a non-string value ever sneaks into the until list, this would raise an AttributeError. Consider whether a guard like s and s.strip() or an isinstance check would be warranted for defensive programming. This is very low priority given the type annotation already constrains to str.

@jklawikowski jklawikowski marked this pull request as ready for review March 30, 2026 16:16
@jklawikowski jklawikowski requested a review from 0xSMT as a code owner March 30, 2026 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants