Skip to content

Fix false positive right-padding warning for decoder-only models in pipeline#44021

Merged
zucchini-nlp merged 4 commits intohuggingface:mainfrom
ManasVardhan:fix-qwen3-warning-bug-43906
Feb 17, 2026
Merged

Fix false positive right-padding warning for decoder-only models in pipeline#44021
zucchini-nlp merged 4 commits intohuggingface:mainfrom
ManasVardhan:fix-qwen3-warning-bug-43906

Conversation

@ManasVardhan
Copy link

What does this PR do?

Fixes #43906 (related to #38071)

Problem

When using pipeline('text-generation') with batched inference on Qwen3 (and other models where pad_token_id == bos_token_id), a spurious warning is emitted:

A decoder-only architecture is being used, but right-padding was detected!

This happens for two reasons:

  1. The TextGenerationPipeline doesn't set padding_side='left' for decoder-only models, so the default 'right' padding is used during batch collation
  2. The right-padding detection heuristic in generate() only checks if the last token equals pad_token_id, which can produce false positives when pad_token_id equals other special tokens

Fix

  1. TextGenerationPipeline.__init__: Automatically set tokenizer.padding_side = 'left' for decoder-only models (since TextGenerationPipeline is exclusively for causal LM)
  2. GenerationMixin.generate: When an attention_mask is available, use it to detect right-padding (check if last position has mask=0) instead of relying solely on the token id heuristic. Falls back to the original check when no attention mask is provided.

Root Cause Analysis

For Qwen3, pad_token_id = bos_token_id = 151643 (<|endoftext|>). The tokenizer's default padding_side='right' means shorter sequences in a batch get right-padded. The existing check inputs_tensor[:, -1] == pad_token_tensor then correctly detects this — but the real issue is that the pipeline should have been left-padding all along.

Even after fixing the pipeline, the attention-mask-based detection is more robust for cases where users call model.generate() directly with properly left-padded inputs whose content happens to end with the pad token.

Who can review?

@gante @ArthurZucker

Two changes to fix the spurious 'right-padding was detected' warning
that fires for Qwen3 and other models during batched pipeline inference:

1. TextGenerationPipeline: Set padding_side='left' automatically for
   decoder-only models. The default tokenizer padding_side is 'right',
   which causes incorrect padding for batched generation. The pipeline
   now overrides this to 'left' on initialization.

2. GenerationMixin.generate: Improve right-padding detection by using
   the attention_mask when available, instead of only checking if the
   last token equals pad_token_id. The old heuristic produced false
   positives when pad_token_id == eos_token_id or bos_token_id (as is
   the case for Qwen3 where both are token 151643).

Fixes huggingface#43906
Related to huggingface#38071
Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…isperForCausalLM)

Only set tokenizer.padding_side='left' when no feature_extractor exists,
to avoid ValueError from pad_collate_fn when they disagree.
@ManasVardhan
Copy link
Author

Thanks for flagging! The Whisper pipeline tests were failing because WhisperForCausalLM has both a tokenizer and a feature_extractor — my change set tokenizer.padding_side='left' which conflicted with the feature_extractor's padding_side='right', causing a ValueError in pad_collate_fn.

Fixed in 3c63b39 — now we only override padding_side when no feature_extractor is present.

@zucchini-nlp
Copy link
Member

zucchini-nlp commented Feb 17, 2026

Thanks, I pushed a fix so we can merge soon. We don't want to check self.feature_extractor in this pipe, the pipe generates text from text and thus is expected to not load a feature_extractor. There was an issue in how test is written

@zucchini-nlp zucchini-nlp enabled auto-merge (squash) February 17, 2026 09:31
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp merged commit 48ad2d5 into huggingface:main Feb 17, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Isolated reproduction of https://github.com/huggingface/transformers/issues/38071

3 participants