Skip to content

[Performance]: 咨询部署方案:DeepSeek-R1-671B 在 12x8卡H20集群上 - 分布式推理 vs 多实例负载均衡的推理方案对比 #20110

Open
@PeifengRen

Description

@PeifengRen

Proposal to improve performance

您好!感谢开发推理引擎。我正在计划部署 DeepSeek-R1-671B 模型,遇到一个关于部署架构选择的疑问,希望能得到一些建议或经验分享。

环境描述:

  • 硬件: 12 台服务器节点。
  • 单节点配置: 每台节点配备 8 张 NVIDIA H20 GPU。
  • 网络: 节点间通过 InfiniBand (IB) 网络互联
  • 总计资源: 96 张 H20 GPU。

目标: 部署 DeepSeek-R1-671B 模型进行推理服务。

考虑方案:
方案一:每节点独立实例 + 负载均衡
方案二:跨节点分布式推理

Report of performance regression

No response

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

The output of `python collect_env.py`

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance-related issues

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions