Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding DeepSpeed Compression Composer #2105

Merged
merged 17 commits into from
Jul 19, 2022
Merged

Conversation

yaozhewei
Copy link
Contributor

Add DeepSpeed Compression Composer:

  1. Software support
  2. Tutorial
  3. Config explanation
  4. etc

conglongli and others added 5 commits July 15, 2022 02:25
* Staging compression library v1 (#314)

* prototype

* add sparse/row/head pruning

* add bert test examples, not testing yet

* rm moq

* add deepspeed based glue example to test compression

* add get/set attr

* tested replacement module

* Custimized Linear Layer Accuracy Checked without any compression technique

* sparse pruning tested

* head pruning tested

* row pruning tested

* enable act dy quantization

* change l1 mask to buffer for better resume training

* add final model saving helper function, only for sparse prunin now

* tested sparse pruning resume training and final model saving

* row pruning resume training and final saving checked

* head pruning resuming training / final model saving

* rm bert from deepspeed

* restruct the code

* add mixed-precision quantization support

* add binary/ternary support

* add weight quantization FP16 assert

* add conv2d

* add compression function

* move config generation to deepspeed side, need elton to take a look

* add activation quantization support

* add sparse pruning support

* add row pruning

* add head pruning

* add channel pruning

* support matching patterns for module names

* update

* fix typo in fix_compression

* add compression scheduler, rm the offset scheduler from MoQ

* fix some errors in head pruning, support redudent clearning (naive version)

* add dim-reduction redudent clearning

* update linear layer

* make cnn example work

* add bn2d

* fix bias issue

* add static act quantization

* support mpu row/colomn parallel linear layer

* add skip_bias_add for mpu linear layers

* make mpu compress work, remove_redundent is not tested yet

* fix several small errors

* add conv1d to linear converter function

* add conv1d to linear converter function

* add conv1d to linear converter function

* make dy-act-quantization per-token or per-image

* cleaning part of the code; more is coming

* enable forward weight quantization which supports both FP32 and some tricky settings

* update readme

* Update README.md

* naming cleaning

* fix static activation loading issue

* update parameter

* Update utils.py

fix a typo

* fix typo

* fix typo

* replace expand_as with view

* Zheweiyao/compression library (#304)

* add forward weight quantization constraint

* add quantize_weight_in_forward warning: a lot of features are not supported

* offset 0 fixing

* add forward weight quantization constraint

* add quantize_weight_in_forward warning: a lot of features are not supported

* offset 0 fixing

* fix a small issue

* omit bias if the model does not have bias

* add contiguous to aviod memory issue

* add scale associated to weight, so people can quantize the weight after training

* add fix weight quantization, change name based on constant.py file

* disable eigen-based MoQ

* When a method is disable (enable: false), we do not need to initialize its related parameters

* weight quantization cleaning

* fix get_quantize_enabled missing problem

* fix redundent cleaning issue, make sure we either get mask from related-module or we enable the method in config

* sort the redundent cleaning step, so we always do quantization, then sparse pruning, then others

* a lot of comment cleaning and args explanation

* add args in config-json.md

* fix format issue

* fix quantization offset step=1 with FP16 optimizer

* Zheweiyao/compression library from s1 (#305)

* add binary/ternary support for FP32 training; this is used to resolve FP16 unstable extreme compression training

* add embedding quantization support

* Xiaoxia/compression library v1 (#307)

* add layer reduction (Xiaoxia/Zhewei)

* fixing bug for sym activation and clean layer reduction (Xiaoxia)

* fixing compression initialization (Xiaoxia/Zhewei)

* fix format issue (#310)

* Xiaoxia/compression library v1 (#311)

* add layer reduction

* fixing bug for sym activation and clean layer reduction

* fixingn compression initialization

* pre-commit...

* Zheweiyao/compression library from s1 (#312)

* fix format issue

* fix the accuracy mismatch after quantization cleaning

* fix clean_model bug and add layer_reduction configuration

Co-authored-by: yaozhewei <zheweiy@berkeley.edu>
Co-authored-by: Elton Zheng <eltonz@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* switch to deepspeed comm

* dummy tutorial

* improve config json

* Zheweiyao/compression library based on s2 (#315)

* change the name and merge layer reduction to init_compression

* add conv1d to linear test unit, fix errors introduced by merging studient initialtization to init_compression

* Update config-json.md

* fix for cifar10 channel pruning

* fix the block_eigenvalue is None bug

* fix the block_eigenvalue is None bug

* move compression-related constants and configs to compression

* tutorial and json config

Co-authored-by: Xiaoxia (Shirley) Wu <94406484+xiaoxiawu-microsoft@users.noreply.github.com>
Co-authored-by: yaozhewei <zheweiy@berkeley.edu>
Co-authored-by: Elton Zheng <eltonz@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: xiaoxiawu <yxiaoxiawu@microsoft.com>
Co-authored-by: xiaoxiawu <xiaoxiawu@microsoft.com>
@yaozhewei yaozhewei changed the title Staging compression library v3 Adding DeepSpeed Compression Composer Jul 18, 2022
@jeffra jeffra merged commit 0f4f2f9 into master Jul 19, 2022
@jeffra jeffra deleted the staging_compression_library_v3 branch July 19, 2022 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants