Skip to content

Code for MLSys 2024 Paper "SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models"

License

Notifications You must be signed in to change notification settings

timlee0212/SiDA-MoE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MOE

Collecting Activations for large models

  1. Run python main.py --model=xxx --sharding. The script will load the pretrained weight from HF to our customized model and save the weight in a sharded format at ./result/[DATABASE]/[MODEL]/ShardedCkpt
  2. Run python main.py --model=xxx to perform inference with the HF load_and_dispatch and collect the activations for use.

TODO:

[ ] Add Disk Offload Function. [ ] Process sharded format when the model size is larger than the main memory.

About

Code for MLSys 2024 Paper "SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published