-
Notifications
You must be signed in to change notification settings - Fork 32.3k
Closed
Labels
Description
System Info
name = "accelerate"
version = "1.12.0"
name = "transformers"
version = "4.57.3"
Python 3.11
Who can help?
@gante @ArthurZucker Related to warning from #38071 for Qwen/Qwen3-Next-80B-A3B-Instruct model
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
from transformers import pipeline
from tqdm import tqdm
LANGUAGE_MODEL = "Qwen/Qwen3-Next-80B-A3B-Instruct"
def main():
pipe = pipeline("text-generation", model=LANGUAGE_MODEL, device_map="auto")
messages = [[
{"role": "system", "content": "hi"},
{"role": "user", "content": "hdi"},
],[
{"role": "system", "content": "hi"},
{"role": "user", "content": "hddi"},
],[
{"role": "system", "content": "hi"},
{"role": "user", "content": "hasdasi"},
],[
{"role": "system", "content": "hi"},
{"role": "user", "content": "hiasdsad"},
],[
{"role": "system", "content": "hi"},
{"role": "user", "content": "hiasd"},
],[
{"role": "system", "content": "hi"},
{"role": "user", "content": "hiasd"},
]]
BATCH_SIZE = 3
for out in tqdm(
pipe(messages, max_new_tokens=4, batch_size=BATCH_SIZE),
total=len(messages),
desc="Batched inference",
):
response = out[0]["generated_text"][-1]["content"].strip()
print(response)
if __name__ == "__main__":
main()
Adding
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Next-80B-A3B-Instruct", padding = "left")
pipe = pipeline("text-generation", model=LANGUAGE_MODEL, tokenizer=tokenizer, device_map="auto")
Does not make the warning go away.
Expected behavior
(ecg-preprocess) (ecg-encoder) -bash-4.4$ CUDA_VISIBLE_DEVICES=4,5,6,7 uv run src/test.py
The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
Loading checkpoint shards: 100%|█████████████████| 41/41 [00:31<00:00, 1.28it/s]
Device set to use cuda:0
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
Batched inference: 0%| | 0/6 [00:00<?, ?it/s])
Hi!
Hi! It looks
Hello! It seems
Hello! It seems
)
) I'm here
Batched inference: 100%|████████████████████████| 6/6 [00:00<00:00, 82782.32it/s]
Reactions are currently unavailable