支持 SGLang 的 expert-parallel-size（EP）

### Reminder

- [x] I have read the above rules and searched the existing issues.

### Description

 目前 kt_ep_wrapper 不支持 SGLang 的 --expert-parallel-size 参数。
    TP=8 在 8×L20（无 NVLink）下 PCIe all-reduce 是主要瓶颈，
    改用 TP=4+EP=2 通信开销能明显降低。

    当前问题

    配了 --tensor-parallel-size 4 --expert-parallel-size 2 后，
    模型加载阶段崩溃：

    RuntimeError: The size of tensor a (192) must match the size of
    tensor b (384) at non-singleton dimension 0

    gpu_experts_mask 初始化时按 384 个全量 experts 创建，
    但 SGLang EP 切分后每个 rank 只剩 192 个 experts，对不上。

    期望行为

    kt_ep_wrapper 感知 SGLang 的 EP rank，按 EP 切分后的 local expert 数
    来创建 gpu_experts_mask，--kt-num-gpu-experts 也应该按 EP rank 自适应。

    环境

    - 模型：Kimi-K2.6 RAWINT4
    - GPU：8×L20（SM89，无 NVLink）
    - CPU：双路 Xeon + AMX
    - ktransformers：2026 年 5 月 main 分支

### Pull Request

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

支持 SGLang 的 expert-parallel-size（EP） #2002

Reminder

Description

Pull Request

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

支持 SGLang 的 expert-parallel-size（EP） #2002

Description

Reminder

Description

Pull Request

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions