The implementation of "SdAE: Self-distillated Masked Autoencoder", ECCV2022.
- Python 3.7
- Pytorch 1.7.0
- torchvision 0.8.1
- timm 0.3.2
- PyYAML
The detailed pre-training instruction is in PRETRAIN.md. Taking 300 epochs Vit-Base pretraining as example:
python -m torch.distributed.launch --nproc_per_node 8 main_pretrain.py \
--batch_size 96 --epochs 300 --model mae_vit_base_patch16 \
--model_teacher vit_base_patch16 --data_path {DATA_DIR} --warmup_epochs 60 \
--mask_ratio 0.75 --blr 2.666e-4 --ema_op per_epoch --ema_frequent 1 \
--momentum_teacher 0.96 --momentum_teacher_final 0.99 \
--drop_path 0.25 --shrink_num 147 --ncrop_loss 3
The fine-tuning instruction is in FINETUNE.md. Taking Vit-Base fine-tuning as example:
python -m torch.distributed.launch --nproc_per_node 8 main_finetune.py --finetune {WEIGHT_DIR} \
--batch_size 128 --epochs 100 --model vit_base_patch16 --dist_eval --data_path {DATA_DIR}
Please cite our paper if the code is helpful to your research.
@inproceedings{chen2022sdae,
author = {Yabo Chen, Yuchen Liu, Dongsheng Jiang, Xiaopeng Zhang, Wenrui Dai, Hongkai Xiong and Qi Tian},
title = {SdAE: Self-distillated masked autoencoder},
booktitle = {ECCV},
year = {2022}
}
The pretaining and finetuning weights are coming soon.