DeepEP, `torch.compile` and Fix Megatron Training Bug by FurtherAI · Pull Request #646 · OpenPipe/ART

FurtherAI · 2026-04-08T18:09:58Z

Summary: 1.15x faster Megatron training and it actually trains now.

DeepEP

DeepEP allows for faster expert parallel (EP) comm. EP comm and the pre/post-processing work surrounding it take roughly as much time as the actual expert MLP computation (on Qwen 3 30B A3B at least) so improvements to this are important. DeepEP gives us a ~1.05x speedup. The gap between DeepEP and Megatron may grow in multi-node settings.

`torch.compile`

We add torch.compile to the model layers and disable some regions that are not compatible. This gives a ~1.10x speedup on top of DeepEP. I did not test max-autotune or cuda graphs here, just basic compilation.

Megatron Training

We noticed that megatron failed to train in the simple yes-no-maybe example. This was caused by the parameter offload. Megatron expects param data tensors to stay constant and offload/reload creating new tensors caused Megatron to lose track of them for updates. We shift to Megatron's offload API to do this properly.

We also remove the optimizer offload, since the optimizer is loaded from disk at the start of each job anyways.

Megatron Provider Options

We expose env variables for controlling Megatron parallelism. We will refactor the configuration system at some point so that you can naturally modify these, but this is the minimal control plane.

…_and_trainability_main # Conflicts: # src/art/megatron/train.py

FurtherAI added 7 commits April 8, 2026 02:53

Plumb packed sequence length through local training backends

127fb84

Add Megatron trainability runtime and service flow

2ef7969

Fix minor regressions

c52bff6

Merge remote-tracking branch 'origin/main' into austin/deepep_compile…

0199fc1

…_and_trainability_main # Conflicts: # src/art/megatron/train.py

Install nvshmem and remove patches

3d6e892

Update CI to sm_90 for DeepEP

16fd201

Fix CI uv cache upload hangs

9dfc106

FurtherAI requested a review from Kovbo April 9, 2026 02:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepEP, `torch.compile` and Fix Megatron Training Bug#646

DeepEP, `torch.compile` and Fix Megatron Training Bug#646
FurtherAI wants to merge 7 commits intomainfrom
austin/deepep_compile_and_trainability_main

FurtherAI commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FurtherAI commented Apr 8, 2026

DeepEP

torch.compile

Megatron Training

Megatron Provider Options

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`torch.compile`