untitledunmastered1998/DistillationLab
experiment environment
- python3.8.12
- pytorch1.10.1
dataset | #train samples | #test samples | #classes | resolution |
---|---|---|---|---|
CIFAR100 | 50000 | 10000 | 100 | low |
MNIST | 60000 | 10000 | 10 | low |
vggface2 | 2763078 | 548208 | 9131 | low |
ImageNet | 1281167 | 50000 | 1000 | high |
ImageNet_subset | 12610 | 5000 | 100 | high |
ImageNet32 | 1281167 | 50000 | 1000 | low |
ImageNet32_reduced | 384631 | 15000 | 300 | low |
Tiny-ImageNet | 100000 | 10000 | 200 | low |
Cars | 8144 | 8041 | 196 | high |
flowers102 | 2040 | 6149 | 102 | high |
stanford_dogs | 12601 | 8519 | 120 | high |
aircrafts | 6667 | 3333 | 100 | high |
Available teacher and student networks including:
'resnet32', 'ResNet18', 'ResNet34', 'ResNet50', 'ResNet101', 'ResNet152',
'mobilenet_v2',
'shufflenet_v2_x0_5', 'shufflenet_v2_x1_0', 'shufflenet_v2_x1_5', 'shufflenet_v2_x2_0',
'squeezenet1_0', 'squeezenet1_1'
networks | parameters |
---|---|
resnet32 | |
ResNet18 | |
ResNet34 | |
ResNet50 | |
ResNet101 | |
ResNet152 | |
mobilenet_v2 | |
shufflenet_v2_x0_5 | |
shufflenet_v2_x1_0 | |
shufflenet_v2_x1_5 | |
shufflenet_v2_x2_0 | |
squeezenet1_0 | |
squeezenet1_1 |
① knowledge distillation [Distilling the Knowledge in a Neural Network] (https://arxiv.org/abs/1503.02531)
② L2
③ FitNets [FitNets: Hints for Thin Deep Nets] (https://arxiv.org/abs/1412.6550)
④ PKT [Learning Deep Representations with Probabilistic Knowledge Transfer] ECCV2018 (https://arxiv.org/abs/1803.10837)
⑤ RKD [Relational Knowledge Distillation] CVPR 2019(https://arxiv.org/abs/1904.05068)
Baseline performance follows standard image classification training procedures.
tricks | performance |
---|---|
baseline | |
+xavier init / kaiming init | |
+pretrained weights | |
+no bias decay | |
+label smoothing | |
+random erasing | |
+linear scaling learning rate | |
+cutout | |
+dropout | |
+cosine learning rate decay | |
+warm up stage | |
+mixup | |
+Zero γ | |
data augmentation | |
Learning Rate Schedule |
teacher | ResNet18 | ResNet34 | ResNet50 | ResNet101 | ResNet152 |
---|---|---|---|---|---|
student | mobilenet_v2 | ||||
t_baseline | |||||
s_baseline | |||||
KD | |||||
FitNets | |||||
RKD | |||||
PKT | |||||
L2 | |||||
AT | |||||
overhaul |
teacher | ResNet18 | ResNet34 | ResNet50 | ResNet101 | ResNet152 |
---|---|---|---|---|---|
student | mobilenet_v2 | shufflenet_v1 | squeezenet_v0 | shufflenet_v2 | WRN-16-2 |
t_baseline | |||||
s_baseline | |||||
KD | |||||
FitNets | |||||
RKD | |||||
PKT | |||||
L2 | |||||
AT | |||||
overhaul |
teacher | ResNet18 | ResNet34 | ResNet50 | ResNet101 | ResNet152 |
---|---|---|---|---|---|
student | resnet8×4 | resnet32 | resnet18 | resnet34 | resnet50 |
t_baseline | |||||
s_baseline | |||||
KD | |||||
FitNets | |||||
RKD | |||||
PKT | |||||
L2 | |||||
AT | |||||
overhaul |
datasets | student | KD | AT | L2 | FitNets | CRD | RKD | PKT | teacher |
---|---|---|---|---|---|---|---|---|---|
CIFAR100→STL-10 | |||||||||
CIFAR100→Tiny-ImageNet |