MOE

Run python main.py --model=xxx --sharding. The script will load the pretrained weight from HF to our customized model and save the weight in a sharded format at ./result/[DATABASE]/[MODEL]/ShardedCkpt
Run python main.py --model=xxx to perform inference with the HF load_and_dispatch and collect the activations for use.

TODO:

[ ] Add Disk Offload Function. [ ] Process sharded format when the model size is larger than the main memory.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
setup.sh		setup.sh

Provide feedback