Support CUDA Graph for internode dispatch normal kernel #438

yifeizhang-c · 2025-09-30T03:11:37Z

Support CUDA Graph for internode dispatch kernels with the same logic as what has been done for intranode dispatch kernels.

yifeizhang-c · 2025-09-30T05:23:08Z

csrc/kernels/internode.cu

-            while (ld_volatile_global(moe_recv_rdma_counter_mapped) != -1);
-            *moe_recv_rdma_counter_mapped = sum;
+            if (num_worst_tokens == 0) {
+                while (ld_volatile_global(moe_recv_rdma_counter_mapped) != -1);


I wish to double confirm the design here. Is the while (ld_volatile_global(...)) logic here aiming for cache coherency? That device side need to check whether the host side value update has already been written back before device side make the update.
I wish to confirm this because intranode dispatch does not have such logic.

yifeizhang-c · 2025-10-28T09:08:08Z

@sphish Hi, can you help review this PR? Thanks!

csrc/kernels/internode.cu

yifeizhang-c commented Sep 30, 2025

View reviewed changes

yifeizhang-c force-pushed the enable-internode-cuda-graph branch 3 times, most recently from 610b076 to 6091f94 Compare October 28, 2025 09:07

yifeizhang-c commented Oct 29, 2025

View reviewed changes

csrc/kernels/internode.cu Show resolved Hide resolved

yifeizhang-c force-pushed the enable-internode-cuda-graph branch from e8ebcaf to 6ad4396 Compare November 5, 2025 06:28

Enable CUDA Graph for internode dispatch

d5e6717

yifeizhang-c force-pushed the enable-internode-cuda-graph branch from 6ad4396 to d5e6717 Compare November 5, 2025 07:29

sphish approved these changes Nov 5, 2025

View reviewed changes

sphish merged commit 92fe2de into deepseek-ai:main Nov 5, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support CUDA Graph for internode dispatch normal kernel #438

Support CUDA Graph for internode dispatch normal kernel #438

yifeizhang-c commented Sep 30, 2025

Uh oh!

yifeizhang-c Sep 30, 2025 •

edited

Loading

Uh oh!

yifeizhang-c commented Oct 28, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support CUDA Graph for internode dispatch normal kernel #438

Support CUDA Graph for internode dispatch normal kernel #438

Conversation

yifeizhang-c commented Sep 30, 2025

Uh oh!

yifeizhang-c Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yifeizhang-c commented Oct 28, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yifeizhang-c Sep 30, 2025 •

edited

Loading