xinference版本升级到0.16.1之后出现并发性能减弱的情况 #2515

magthub · 2024-11-05T03:35:03Z

xinference 部署方式如下

#!/bin/bash

获取当前日期和时间，格式为YYYY-MM-DD_HH-MM-SS

current_time=$(date +"%Y-%m-%d_%H-%M-%S")

检查当前目录下是否存在 nohup.out 文件，并确保没有其他进程正在使用它

if [ -f "nohup.out" ]; then
lsof_output=$(lsof nohup.out)
if [ -z "$lsof_output" ]; then
# 如果没有其他进程正在使用nohup.out，重命名文件
mv nohup.out "nohup_${current_time}.out"
else
echo "nohup.out is currently in use by another process. Exiting."
exit 1
fi
fi

设置环境变量

export XINFERENCE_HOME=/data/xinference_home
export XINFERENCE_MODEL_SRC=modelscope

启动服务

nohup xinference-local --host 0.0.0.0 --port 9997 >nohup.out 2>&1 &

提示用户服务已启动

echo "Service has been started and running in the background. Logs are being written to nohup.out."

以此方式部署qwen2.5
当采用以下python脚本进行并发测试的时候

当并发量大于16之后速度明显下降 16之前并发之后的token生成速率约为25左右但是到17之后并发量为2tokens/s
设备采用的是8卡3090服务器

之前的版本设置多副本之后并发量也是成倍数增长的但是最新版之后并发性能明显下降。通过实时监控设备GPU使用情况，发现GPU使用率在高并发时明显上不去。

github-actions · 2024-11-12T19:03:32Z

This issue is stale because it has been open for 7 days with no activity.

XprobeBot added the gpu label Nov 5, 2024

XprobeBot added this to the v0.16 milestone Nov 5, 2024

github-actions bot added the stale label Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xinference版本升级到0.16.1之后出现并发性能减弱的情况 #2515

xinference版本升级到0.16.1之后出现并发性能减弱的情况 #2515

magthub commented Nov 5, 2024

github-actions bot commented Nov 12, 2024

xinference版本升级到0.16.1之后出现并发性能减弱的情况 #2515

xinference版本升级到0.16.1之后出现并发性能减弱的情况 #2515

Comments

magthub commented Nov 5, 2024

获取当前日期和时间，格式为YYYY-MM-DD_HH-MM-SS

检查当前目录下是否存在 nohup.out 文件，并确保没有其他进程正在使用它

设置环境变量

启动服务

提示用户服务已启动

github-actions bot commented Nov 12, 2024