Skip to content

Official Implementation of "The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers (ECCV 2024)”

Notifications You must be signed in to change notification settings

effl-lab/MaskedKD

Repository files navigation

MaskedKD (Masked Knowledge Distillation)

This is the official implementation of a paper titled "The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers (ECCV 2024)"


Seungwoo Son, Jegwang Ryu, Namhoon Lee, Jaeho Lee
Pohang University of Science and Technology (POSTECH)

Summary

Our method, MaskedKD, reduces supervision cost by masking teacher ViT input based on student attention, maintaining student accuracy while saving computation.

Installation

Since we have implemeneted on DeiT and MAE's official code, just following MAE's guideline.

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is:

│path/to/imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Experiments

Training on ImageNet-1K

To train a DeiT-Small student with a DeiT-Base teacher, run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \
--model deit_small_patch16_224  --teacher_model deit_base --epochs 300 \
--batch-size 128 --data-path /path/to/ILSVRC2012/ --distillation-type soft \
--distillation-alpha 0.5 --distillation-tau 1  --input-size 224 --maskedkd --len_num_keep 98 \
--output_dir /path/to/output_dir/

Acknowledgment

This repo is based on DeiT, MAE and pytorch-image-models.

About

Official Implementation of "The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers (ECCV 2024)”

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages