Skip to content
/ CMoE Public

Implementation for the paper: CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference

License

Notifications You must be signed in to change notification settings

JarvisPei/CMoE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CMoE

Implementation for the paper CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference.

Dependencies

conda create -n cmoe python=3.11
conda activate cmoe
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=12.4 -c pytorch -c nvidia
pip install datasets==2.21.0
pip install transformers==4.47.1
pip install accelerate==1.2.1
pip install sentencepiece==0.2.0
pip install protobuf==5.29.2
pip install matplotlib==3.10.0
pip install lap==0.5.12
pip install peft==0.14.0

Note: please modify the version of some packages for your own environment.

Quick Start

Download the models from Huggingface, then the you can run the code run_cmoe.py. Set model path as 'MODEL_PATH'.

You can run the pre-defined testing script 'run.sh' by:

bash run.sh

Or resetting the hyperparameters to run customized setting. For example, run S2A2E16 with 2,048 fine-tuning data on wikitext2:

python run_cmoe.py $MODEL_PATH wikitext2 \ 
--nshared 2 \
--nactivated 2 \
--nexperts 16 \
--nsamples 2048 \
--extra-lr 0.001 --bias-speed 0.001 --new-eval

Evaluation

Our code automatically run ppl eval. If you want to do evaluation on downstream tasks, you can add the arg --eval-zero, where the code is implemented by Wanda.

Cite

If you found this work useful, please consider citing:

@article{pei2025cmoe,
  title={CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference},
  author={Pei, Zehua and Zou, Lancheng and Zhen, Hui-Ling and Yu, Xianzhi and Liu, Wulong and Pan, Sinno Jialin and Yuan, Mingxuan and Yu, Bei},
  journal={arXiv preprint arXiv:2502.04416},
  year={2025}
}

About

Implementation for the paper: CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published