Skip to content

docs or demos to illustrate the key features #1061

@qsh-zh

Description

@qsh-zh

Congrats on the release! Some of the features are so cool they feel like black magic.

Would it be possible to explain the key techniques behind those features, or provide a tutorial/demo so users can reproduce the claimed results?

Image

In particularly, I am interested in the claim

Memory-efficient design: Train 200B MoE models on 64k sequence lengths without sequence parallelism through advanced memory optimization techniques

It sounds very challenging, unless there’s aggressive offloading and recomputation and may suffer from slow iteration speed.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions