Skip to content

Commit acdd4ca

Browse files
authored
add self distillation and example for resnet (#1473)
1 parent ffa9a39 commit acdd4ca

File tree

15 files changed

+1685
-0
lines changed

15 files changed

+1685
-0
lines changed

docs/distillation.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ Distillation
77

88
1.2. [Intermediate Layer Knowledge Distillation](#intermediate-layer-knowledge-distillation)
99

10+
1.3. [Self Distillation](#self-distillation)
11+
1012
2. [Distillation Support Matrix](#distillation-support-matrix)
1113
3. [Get Started with Distillation API ](#get-started-with-distillation-api)
1214
4. [Examples](#examples)
@@ -35,12 +37,23 @@ $$L_{KD} = \sum\limits_i D(T_t^{n_i}(F_t^{n_i}), T_s^{m_i}(F_s^{m_i}))$$
3537

3638
Where $D$ is a distance measurement as before, $F_t^{n_i}$ the output feature of the $n_i$'s layer of the teacher model, $F_s^{m_i}$ the output feature of the $m_i$'s layer of the student model. Since the dimensions of $F_t^{n_i}$ and $F_s^{m_i}$ are usually different, the transformations $T_t^{n_i}$ and $T_s^{m_i}$ are needed to match dimensions of the two features. Specifically, the transformation can take the forms like identity, linear transformation, 1X1 convolution etc.
3739

40+
### Self Distillation
41+
42+
Self-distillation ia a one-stage training method where the teacher model and student models can be trained together. It attaches several attention modules and shallow classifiers at different depths of neural networks and distills knowledge from the deepest classifier to the shallower classifiers. Different from the conventional knowledge distillation methods where the knowledge of the teacher model is transferred to another student model, self-distillation can be considered as knowledge transfer in the same model, from the deeper layers to the shallower layers.
43+
The additional classifiers in self-distillation allow the neural network to work in a dynamic manner, which leads to a much higher acceleration.
44+
<br>
45+
46+
<img src="./imgs/self-distillation.png" alt="Architecture" width=800 height=350>
47+
48+
Architecture from paper [Self-Distillation: Towards Efficient and Compact Neural Networks](https://ieeexplore.ieee.org/document/9381661)
49+
3850
## Distillation Support Matrix
3951

4052
|Distillation Algorithm |PyTorch |TensorFlow |
4153
|------------------------------------------------|:--------:|:---------:|
4254
|Knowledge Distillation |&#10004; |&#10004; |
4355
|Intermediate Layer Knowledge Distillation |&#10004; |Will be supported|
56+
|Self Distillation |&#10004; |&#10006; |
4457

4558
## Get Started with Distillation API
4659

docs/imgs/self-distillation.png

311 KB
Loading

examples/README.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -336,6 +336,7 @@ Intel® Neural Compressor validated examples with multiple compression technique
336336
<th>Student Model</th>
337337
<th>Teacher Model</th>
338338
<th>Domain</th>
339+
<th>Approach </th>
339340
<th>Examples</th>
340341
</tr>
341342
</thead>
@@ -344,6 +345,7 @@ Intel® Neural Compressor validated examples with multiple compression technique
344345
<td>MobileNet</td>
345346
<td>DenseNet201</td>
346347
<td>Image Recognition</td>
348+
<td>Knowledge Distillation</td>
347349
<td><a href="./tensorflow/image_recognition/tensorflow_models/distillation">pb</a></td>
348350
</tr>
349351
</tbody>
@@ -613,6 +615,7 @@ Intel® Neural Compressor validated examples with multiple compression technique
613615
<th>Student Model</th>
614616
<th>Teacher Model</th>
615617
<th>Domain</th>
618+
<th>Approach</th>
616619
<th>Examples</th>
617620
</tr>
618621
</thead>
@@ -621,60 +624,77 @@ Intel® Neural Compressor validated examples with multiple compression technique
621624
<td>CNN-2</td>
622625
<td>CNN-10</td>
623626
<td>Image Recognition</td>
627+
<td>Knowledge Distillation</td>
624628
<td><a href="./pytorch/image_recognition/CNN-2/distillation/eager">eager</a></td>
625629
</tr>
626630
<tr>
627631
<td>MobileNet V2-0.35</td>
628632
<td>WideResNet40-2</td>
629633
<td>Image Recognition</td>
634+
<td>Knowledge Distillation</td>
630635
<td><a href="./pytorch/image_recognition/MobileNetV2-0.35/distillation/eager">eager</a></td>
631636
</tr>
632637
<tr>
633638
<td>ResNet18|ResNet34|ResNet50|ResNet101</td>
634639
<td>ResNet18|ResNet34|ResNet50|ResNet101</td>
635640
<td>Image Recognition</td>
641+
<td>Knowledge Distillation</td>
636642
<td><a href="./pytorch/image_recognition/torchvision_models/distillation/eager">eager</a></td>
637643
</tr>
644+
<tr>
645+
<td>ResNet18|ResNet34|ResNet50|ResNet101</td>
646+
<td>ResNet18|ResNet34|ResNet50|ResNet101</td>
647+
<td>Image Recognition</td>
648+
<td>Self Distillation</td>
649+
<td><a href="./pytorch/image_recognition/torchvision_models/self_distillation/eager">eager</a></td>
650+
</tr>
638651
<tr>
639652
<td>VGG-8</td>
640653
<td>VGG-13</td>
641654
<td>Image Recognition</td>
655+
<td>Knowledge Distillation</td>
642656
<td><a href="./pytorch/image_recognition/VGG-8/distillation/eager">eager</a></td>
643657
</tr>
644658
<tr>
645659
<td>BlendCNN</td>
646660
<td>BERT-Base</td>
647661
<td>Natural Language Processing</td>
662+
<td>Knowledge Distillation</td>
648663
<td><a href="./pytorch/nlp/blendcnn/distillation/eager">eager</a></td>
649664
</tr>
650665
<tr>
651666
<td>DistilBERT</td>
652667
<td>BERT-Base</td>
653668
<td>Natural Language Processing</td>
669+
<td>Knowledge Distillation</td>
654670
<td><a href="./pytorch/nlp/huggingface_models/question-answering/distillation/eager">eager</a></td>
655671
</tr>
656672
<tr>
657673
<td>BiLSTM</td>
658674
<td>RoBERTa-Base</td>
659675
<td>Natural Language Processing</td>
676+
<td>Knowledge Distillation</td>
660677
<td><a href="./pytorch/nlp/huggingface_models/text-classification/distillation/eager">eager</a></td>
661678
</tr>
662679
<tr>
663680
<td>TinyBERT</td>
664681
<td>BERT-Base</td>
665682
<td>Natural Language Processing</td>
683+
<td>Knowledge Distillation</td>
666684
<td><a href="./pytorch/nlp/huggingface_models/text-classification/distillation/eager">eager</a></td>
667685
</tr>
668686
<tr>
669687
<td>BERT-3</td>
670688
<td>BERT-Base</td>
671689
<td>Natural Language Processing</td>
690+
<td>Knowledge Distillation</td>
672691
<td><a href="./pytorch/nlp/huggingface_models/text-classification/distillation/eager">eager</a></td>
673692
</tr>
674693
<tr>
675694
<td>DistilRoBERTa</td>
676695
<td>RoBERTa-Large</td>
677696
<td>Natural Language Processing</td>
697+
<td>Knowledge Distillation</td>
678698
<td><a href="./pytorch/nlp/huggingface_models/text-classification/distillation/eager">eager</a></td>
679699
</tr>
680700
</tbody>
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
Details **TBD**
2+
### Prepare requirements
3+
```shell
4+
pip install -r requirements.txt
5+
```
6+
### Run self distillation
7+
```shell
8+
bash run_distillation.sh --topology=(resnet18|resnet34|resnet50|resnet101) --config=conf.yaml --output_model=path/to/output_model --dataset_location=path/to/dataset --use_cpu=(0|1)
9+
```
10+
### CIFAR100 benchmark
11+
https://github.com/weiaicunzai/pytorch-cifar100
12+
13+
### Paper:
14+
[Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation](https://openaccess.thecvf.com/content_ICCV_2019/html/Zhang_Be_Your_Own_Teacher_Improve_the_Performance_of_Convolutional_Neural_ICCV_2019_paper.html)
15+
16+
[Self-Distillation: Towards Efficient and Compact Neural Networks](https://ieeexplore.ieee.org/document/9381661)
17+
18+
### Our results in CIFAR100
19+
| model | Baseline | Classifier1 | Classifier2 | Classifier3 | Classifier4 | Ensemble |
20+
| :------: | :-------:| :---------: | :---------: | :---------: | :---------: | :------: |
21+
| Resnet50 | 80.88 | 82.06 | 83.64 | 83.85 | 83.41 | 85.10 |
22+

0 commit comments

Comments
 (0)