Skip to content

Conversation

@CUHKSZzxy
Copy link
Collaborator

@CUHKSZzxy CUHKSZzxy commented Oct 30, 2025

Modifications

  1. Expose deepep env var

Default deepep buffer num sms will raise the following errors on H200 multi-nodes. Therefore, we expose this environment variable to users for configuration. A feasible value on H200 is DEEPEP_BUFFER_NUM_SMS=16.

csrc/kernels/internode.cu:386, condition: ibgda_get_state()->num_rc_per_pe == num_channels or ibgda_get_state()->num_rc_per_pe >= num_sms

This is a known issue in deepep

  1. Fix DeepEP mode in CUDA graph

Flip DeepEP mode between prefill and decode, and also clear the buffer (performed by the DLBLas side when setting to low latency). Otherwise, it will trigger CUDA illegal memory access in deepep or the following deepgemm kernel, as known in

  1. Upgrade DeepEP / DeepGEMM / DLBlas / FlashMLA
  • DeepEP -> v1.2.1
  • DeepGEMM -> v2.1.1.post3
  • DLBlas -> v0.0.6
  • FlashMLA -> commit 1408756 (no official release)
  1. Other modifications
  • Add some deep_gemm cuda dependencies
  • Pin torch version to avoid build / runtime version mismatch (leads to undefined symbol for deep_gemm)
  • Add vim
  • Add some comments

@CUHKSZzxy CUHKSZzxy changed the title Fix ep Fix ep deployment issues Oct 30, 2025
@CUHKSZzxy CUHKSZzxy marked this pull request as draft October 30, 2025 03:12
@windreamer windreamer self-requested a review October 30, 2025 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants