Fix false positive right-padding warning for decoder-only models in pipeline#44021
Conversation
Two changes to fix the spurious 'right-padding was detected' warning that fires for Qwen3 and other models during batched pipeline inference: 1. TextGenerationPipeline: Set padding_side='left' automatically for decoder-only models. The default tokenizer padding_side is 'right', which causes incorrect padding for batched generation. The pipeline now overrides this to 'left' on initialization. 2. GenerationMixin.generate: Improve right-padding detection by using the attention_mask when available, instead of only checking if the last token equals pad_token_id. The old heuristic produced false positives when pad_token_id == eos_token_id or bos_token_id (as is the case for Qwen3 where both are token 151643). Fixes huggingface#43906 Related to huggingface#38071
zucchini-nlp
left a comment
There was a problem hiding this comment.
Thanks, can you chekc failing whisper pipeline tests?
…isperForCausalLM) Only set tokenizer.padding_side='left' when no feature_extractor exists, to avoid ValueError from pad_collate_fn when they disagree.
|
Thanks for flagging! The Whisper pipeline tests were failing because Fixed in 3c63b39 — now we only override |
|
Thanks, I pushed a fix so we can merge soon. We don't want to check |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
What does this PR do?
Fixes #43906 (related to #38071)
Problem
When using
pipeline('text-generation')with batched inference on Qwen3 (and other models wherepad_token_id == bos_token_id), a spurious warning is emitted:This happens for two reasons:
TextGenerationPipelinedoesn't setpadding_side='left'for decoder-only models, so the default'right'padding is used during batch collationgenerate()only checks if the last token equalspad_token_id, which can produce false positives whenpad_token_idequals other special tokensFix
TextGenerationPipeline.__init__: Automatically settokenizer.padding_side = 'left'for decoder-only models (sinceTextGenerationPipelineis exclusively for causal LM)GenerationMixin.generate: When anattention_maskis available, use it to detect right-padding (check if last position has mask=0) instead of relying solely on the token id heuristic. Falls back to the original check when no attention mask is provided.Root Cause Analysis
For Qwen3,
pad_token_id = bos_token_id = 151643(<|endoftext|>). The tokenizer's defaultpadding_side='right'means shorter sequences in a batch get right-padded. The existing checkinputs_tensor[:, -1] == pad_token_tensorthen correctly detects this — but the real issue is that the pipeline should have been left-padding all along.Even after fixing the pipeline, the attention-mask-based detection is more robust for cases where users call
model.generate()directly with properly left-padded inputs whose content happens to end with the pad token.Who can review?
@gante @ArthurZucker