docs or demos to illustrate the key features

Congrats on the release! Some of the features are so cool they feel like black magic.

Would it be possible to explain the key techniques behind those features, or provide a tutorial/demo so users can reproduce the claimed results?

<img width="870" height="553" alt="Image" src="https://github.com/user-attachments/assets/963fe7ab-6f2a-4dd3-b6ec-4e3076d45ae1" />


In particularly, I am interested in the  claim

> Memory-efficient design: Train 200B MoE models on 64k sequence lengths without sequence parallelism through advanced memory optimization techniques

It sounds very challenging, unless there’s aggressive offloading and recomputation and may suffer from slow iteration speed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs or demos to illustrate the key features #1061

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

docs or demos to illustrate the key features #1061

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions