Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xinference版本升级到0.16.1之后出现并发性能减弱的情况 #2515

Open
magthub opened this issue Nov 5, 2024 · 1 comment
Open
Milestone

Comments

@magthub
Copy link

magthub commented Nov 5, 2024

xinference 部署方式如下

#!/bin/bash

获取当前日期和时间,格式为YYYY-MM-DD_HH-MM-SS

current_time=$(date +"%Y-%m-%d_%H-%M-%S")

检查当前目录下是否存在 nohup.out 文件,并确保没有其他进程正在使用它

if [ -f "nohup.out" ]; then
lsof_output=$(lsof nohup.out)
if [ -z "$lsof_output" ]; then
# 如果没有其他进程正在使用nohup.out,重命名文件
mv nohup.out "nohup_${current_time}.out"
else
echo "nohup.out is currently in use by another process. Exiting."
exit 1
fi
fi

设置环境变量

export XINFERENCE_HOME=/data/xinference_home
export XINFERENCE_MODEL_SRC=modelscope

启动服务

nohup xinference-local --host 0.0.0.0 --port 9997 >nohup.out 2>&1 &

提示用户服务已启动

echo "Service has been started and running in the background. Logs are being written to nohup.out."
image
image
以此方式部署qwen2.5
当采用以下python脚本进行并发测试的时候

image
当并发量大于16之后 速度明显下降 16之前并发之后的token生成速率约为25左右 但是到17之后并发量为2tokens/s
设备采用的是8卡3090服务器
image
之前的版本设置多副本之后并发量也是成倍数增长的 但是最新版之后并发性能明显下降。通过实时监控设备GPU使用情况,发现GPU使用率在高并发时明显上不去。

@XprobeBot XprobeBot added the gpu label Nov 5, 2024
@XprobeBot XprobeBot added this to the v0.16 milestone Nov 5, 2024
Copy link

This issue is stale because it has been open for 7 days with no activity.

@github-actions github-actions bot added the stale label Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants