Skip to content

Add Mixture of Experts #479

@sdtblck

Description

@sdtblck

from DeepSpeed-MoE for NLG: Reducing the training cost of language models by 5 times .

It should be a fairly simple addition as the codebase they open source is largely similar to ours (same base model, although we have diverged a bit since).

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions