批量推理vllm，出现repetition现象

多卡批量推理 swift infer 时，推理后端选择vllm，推理的response会出现较多的repetition，推理后端换成pt，推理的response不会出现repetition

下面是我的infer脚本
NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer \
    --model ./output/dpo/v9-20260114-161202/checkpoint-100 \
    --infer_backend vllm \
    --val_dataset ./data/dpo/val_1k_4_1.jsonl \
    --max_batch_size 32 \
    --max_new_tokens 2048 \
    --max_length 4096 \
    --dataset_num_proc 4 \



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

批量推理vllm，出现repetition现象 #7394

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

批量推理vllm，出现repetition现象 #7394

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions