Open
Description
Proposal to improve performance
您好!感谢开发推理引擎。我正在计划部署 DeepSeek-R1-671B 模型,遇到一个关于部署架构选择的疑问,希望能得到一些建议或经验分享。
环境描述:
- 硬件: 12 台服务器节点。
- 单节点配置: 每台节点配备 8 张 NVIDIA H20 GPU。
- 网络: 节点间通过 InfiniBand (IB) 网络互联
- 总计资源: 96 张 H20 GPU。
目标: 部署 DeepSeek-R1-671B 模型进行推理服务。
考虑方案:
方案一:每节点独立实例 + 负载均衡
方案二:跨节点分布式推理
Report of performance regression
No response
Misc discussion on performance
No response
Your current environment (if you think it is necessary)
The output of `python collect_env.py`
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Type
Projects
Status
Backlog