-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Description
多卡批量推理 swift infer 时,推理后端选择vllm,推理的response会出现较多的repetition,推理后端换成pt,推理的response不会出现repetition
下面是我的infer脚本
NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer
--model ./output/dpo/v9-20260114-161202/checkpoint-100
--infer_backend vllm
--val_dataset ./data/dpo/val_1k_4_1.jsonl
--max_batch_size 32
--max_new_tokens 2048
--max_length 4096
--dataset_num_proc 4 \
Metadata
Metadata
Assignees
Labels
No labels