Isolated reproduction of https://github.com/huggingface/transformers/issues/38071

### System Info

name = "accelerate"
version = "1.12.0"
name = "transformers"
version = "4.57.3"

Python 3.11

### Who can help?

@gante  @ArthurZucker  Related to warning from https://github.com/huggingface/transformers/issues/38071 for `Qwen/Qwen3-Next-80B-A3B-Instruct` model

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

```
from transformers import pipeline
from tqdm import tqdm
LANGUAGE_MODEL = "Qwen/Qwen3-Next-80B-A3B-Instruct"
def main():

    pipe = pipeline("text-generation", model=LANGUAGE_MODEL, device_map="auto")
    messages = [[
        {"role": "system", "content": "hi"},
        {"role": "user", "content": "hdi"},
    ],[
        {"role": "system", "content": "hi"},
        {"role": "user", "content": "hddi"},
    ],[
        {"role": "system", "content": "hi"},
        {"role": "user", "content": "hasdasi"},
    ],[
        {"role": "system", "content": "hi"},
        {"role": "user", "content": "hiasdsad"},
    ],[
        {"role": "system", "content": "hi"},
        {"role": "user", "content": "hiasd"},
    ],[
        {"role": "system", "content": "hi"},
        {"role": "user", "content": "hiasd"},
    ]]

    BATCH_SIZE = 3
    for out in tqdm(
        pipe(messages, max_new_tokens=4, batch_size=BATCH_SIZE),
        total=len(messages),
        desc="Batched inference",
    ):
        response = out[0]["generated_text"][-1]["content"].strip()
        print(response)

if __name__ == "__main__":
    main()
```

Adding 

```
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Next-80B-A3B-Instruct", padding = "left")
pipe = pipeline("text-generation", model=LANGUAGE_MODEL, tokenizer=tokenizer, device_map="auto")
```
Does not make the warning go away.

### Expected behavior

```
(ecg-preprocess) (ecg-encoder) -bash-4.4$ CUDA_VISIBLE_DEVICES=4,5,6,7 uv run src/test.py
The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
Loading checkpoint shards: 100%|█████████████████| 41/41 [00:31<00:00,  1.28it/s]
Device set to use cuda:0
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
Batched inference:   0%|                                   | 0/6 [00:00<?, ?it/s])  
Hi!
Hi! It looks
Hello! It seems
Hello! It seems
)
) I'm here
Batched inference: 100%|████████████████████████| 6/6 [00:00<00:00, 82782.32it/s]
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Isolated reproduction of https://github.com/huggingface/transformers/issues/38071 #43906

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Isolated reproduction of https://github.com/huggingface/transformers/issues/38071 #43906

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions