Decay Pruning Method (DPM) is a novel smooth and dynamic pruning approach, that can be seemingly integrated with various existing structured pruning methods, providing significant improvement. Unlike traditional single-step pruning approaches that remove or zero out redundant structures abruptly, DPM employs a multi-step, smooth process that gradually decays these structures to zero, for better information retention. Additionally, DPM incorporates a gradient-based self-rectifying procedure that identifies and corrects sub-optimal pruning decisions during the decay, ensuring more precise and adaptive pruning decisions.
Our DPM contains two procedures:
- Smooth Pruning (SP): SP is a multi-step pruning process that gradually decays the weights of redundant structures to zero over N steps while maintaining continuous optimization. This approach minimizes drastic network changes and enhances information retention during pruning.
- Self-Rectifying (SR): SR employs a gradient-driven approach to assess the resistance of decaying structures, effectively correcting sub-optimal pruning decisions and enabling more adaptive and optimal pruning decisions.
More technical details of DPM are available at our preprint paper: Decay Pruning Method: Smooth Pruning With a Self-Rectifying Procedure
- Add integration examples and tutorials for Depgraph and gate-decorator.
- Make tutorials and user-friendly codes for DPM integration.
We verified the effectiveness and generalizability of DPM by integrating it into three pruning frameworks: the newly proposed OTOv2 and Depgraph, as well as the classic Gate-Decorator, each within their original configurations. All codes for these examples of integration will be uploaded soon!
Method | FLOPs | Params | Top-1 Acc. |
---|---|---|---|
Baseline | 100% | 100% | 93.2% |
EC [1] | 65.8% | 37.0% | 93.1% |
Hinge [2] | 60.9% | 20.0% | 93.6% |
SCP [3] | 33.8% | 7.0% | 93.8% |
OTOv2 [4] | 26.5% | 4.8% | 93.4% |
+SP (Ours) | 26.4% -0.1%↓ | 4.8% - | 93.6% +0.2%↑ |
+SR (Ours) | 25.8% -0.7%↓ | 4.8% - | 93.8% +0.4%↑ |
Notations:
- Note 1: We use ’+SP’ to denote the exclusive use of Smooth Pruning, and ’+SR’ when both Smooth Pruning and Self-Rectifying are applied.
- Note 2: The best pruning results are highlighted in bold.
- Note 3: The enhancement from DPM are presented with superscript.
In this benchmark, DPM notably increased the accuracy of OTOv2 by 0.4% and reduced the FLOPs by 0.7%, achieving the highest performance compared to OTOv2 and other leading methods, including SCP.
Method | FLOPs | Params | Top-1 Acc. |
---|---|---|---|
Baseline | 100% | 100% | 93.53% |
Hinge [2] | 50.0% | 48.73% | 93.69% |
SCP [3] | 51.5% | 48.47% | 93.23% |
ResRep [5] | 47.2% | — | 93.71% |
SANP [6] | 48.0% | — | 93.81% |
APIB [7] | 46.0% | 50.0% | 93.92% |
SFP [8] | 47.4% | — | 93.66% |
ASFP [9] | 47.4% | — | 93.32% |
Depgraph [10] | 46.86% | 52.9% | 93.84% |
+SP (Ours) | 46.32% -0.1%↓ | 49.69% -3.21%↓ | 93.96% +0.12%↑ |
+SR (Ours) | 45.80% -1.06%↓ | 47.22% -5.68%↓ | 94.13% +0.29%↑ |
Depgraph w/o SL [10] | 47.2% | 69.7% | 93.32% |
+SP (Ours) | 46.7% -0.5%↓ | 68.1% -1.6%↓ | 93.62% +0.3%↑ |
+SR (Ours) | 47.1% -0.1%↓ | 65.7% -4.0%↓ | 93.71% +0.39%↑ |
Notations:
- Note 1: For cases where results are not reported from literature, we mark them as ‘-’.
- Note 2: "w/o SL" = "without sparse learning".
DPM significantly enhances accuracy across both configurations, reducing parameters by over 4% compared to the original Depgraph. Specifically, with the Group Pruner combined with Sparse Learning, DPM achieves a state-of-the-art accuracy of 94.13%, while further reducing FLOPs by 1% and parameters by 5.7%. This performance significantly surpasses the state-of-the-art method APIB by 0.2% in accuracy, with even higher model efficiency.
Method | FLOPs | Params | Top-1 Acc. |
---|---|---|---|
Gate-Decorator [11] | 9.86% | 1.98% | 91.50% |
+SP (Ours) | 9.88% +0.02%↑ | 1.97% -0.01%↓ | 91.58% +0.08%↑ |
+SR (Ours) | 9.79% -0.07%↓ | 1.95% -0.03%↓ | 91.74% +0.24%↑ |
DPM improves accuracy by 0.24%, reduces FLOPs by 0.07%, and decreases parameters by 0.03%.
@misc{yang2024decaypruningmethodsmooth,
title={Decay Pruning Method: Smooth Pruning With a Self-Rectifying Procedure},
author={Minghao Yang and Linlin Gao and Pengyuan Li and Wenbo Li and Yihong Dong and Zhiying Cui},
year={2024},
eprint={2406.03879},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2406.03879},
}
[1] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient convnets,” in Proc. Int. Conf. Learn. Represent., 2017.
[2] Y. Li, S. Gu, C. Mayer, L. V. Gool, and R. Timofte, “Group sparsity: The hinge between filter pruning and decomposition for network com-pression,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 8015–8024.
[3] M. Kang and B. Han, “Operation-aware soft channel pruning using differentiable masks,” in Proc. Int. Conf. Mach. Learn., 2020, pp. 5122– 5131.
[4] T. Chen, L. Liang, T. Ding, Z. Zhu, and I. Zharkov, “Otov2: Automatic, generic, user-friendly,” in Proc. Int. Conf. Learn. Represent., 2023.
[5] X. Ding, T. Hao, J. Tan, J. Liu, J. Han, Y. Guo, and G. Ding, “Resrep: Lossless cnn pruning via decoupling remembering and forgetting,” in Proc. IEEE Int. Conf. Comput. Vis., 2020, pp. 4490–4500.
[6] S. Gao, Z. Zhang, Y. Zhang, F. Huang, and H. Huang, “Structural alignment for network pruning through partial regularization,” in Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 17 356–17 366.
[7] S. Guo, L. Zhang, X. Zheng, Y. Wang, Y. Li, F. Chao, C. Wu, S. Zhang, and R. Ji, “Automatic network pruning via hilbert-schmidt independence criterion lasso under information bottleneck principle,” in Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 17 412–17 423.
[8] Y. He, G. Kang, X. Dong, Y. Fu, and Y. Yang, “Soft filter pruning for accelerating deep convolutional neural networks,” in Proc. Int. Joint Conf. Artif. Intell., 2018.
[9] Y. He, X. Dong, G. Kang, Y. Fu, C. Yan, and Y. Yang, “Asymptotic soft filter pruning for deep convolutional neural networks,” arXiv preprint arXiv:1808.07471, 2019.
[10] G. Fang, X. Ma, M. Song, M. B. Mi, and X. Wang, “Depgraph: Towards any structural pruning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 16 091–16 101.
[11] Z. You, K. Yan, J. Ye, M. Ma, and P. Wang, “Gate decorator: Global filter pruning method for accelerating deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 2130–2141.