Mert Bulent Sariyildiz · Philippe Weinzaepfel · Thomas Lucas · Diane Larlus · Yannis Kalantidis
NAVER LABS Europe
ECCV 2024
For training UNIC models on ImageNet-1K (by distilling from the four teachers we used in the paper), you need some Python packages, pretrained weights for the teacher models and the ImageNet-1K dataset.
- Create a conda environment with all the necessary packages for training and evaluation:
env_name="unic"
conda create -n ${env_name}
conda activate ${env_name}
conda install pytorch=2.1.1 pytorch-cuda=12.1 torchvision \
timm transformers einops torchmetrics optuna \
tensorboard matplotlib pandas scikit-learn-intelex omegaconf \
-c pytorch -c nvidia -c conda-forge
- Set the path of your conda in scripts/setup_env.sh, i.e. update the
conda_dir
variable. Then your environment will be automatically used by both the training and evaluation scripts.
- Download the teacher models we used in our work. We provide bash scripts to automatize this process, under the scripts/teachers folder. To download all teachers at once, use scripts/teachers/_prepare_all.sh:
(cd scripts/teachers && ./_prepare_all.sh <path_to_download_directory>)
- Once teacher checkpoints are downloaded, update the
TEACHER_CFG
variable in teachers/config.py to point to the correct paths.
- Download the ImageNet-1K dataset (ILSVRC-2012). Check out the official website for details.
- Use the main_unic.py script to train UNIC models.
By default, it distills the following four teachers into a ViT-Base/16 student:
- DINO (
dino_vitbase_16
) - DeiT-III (
deit3_vitbase_16
) - iBOT (
ibot_vitbase_16
) - dBOT fine-tuned on ImageNet-1K classification (
dbotft_vitbase_16
)
- DINO (
So make sure to download the teacher models (see the Teacher models section).
The architecture of the student encoder is compatible with DINOv2.
We trained our UNIC models on 4 GPUs, with minimum 32GB of memory per GPU. The default batch size is 128 per GPU, adjust it according to your GPU memory (learning rate will be scaled accordingly).
- To train a UNIC model, use the following commands (available in scripts/train_unic.sh):
# - Initialize the conda environment
# - Set ${MASTER_ADDR}, ${MASTER_PORT}, ${N_GPUS} for distributed training
source ./scripts/setup_env.sh
dataset_dir="/path/to/imagenet-1k"
output_dir="/path/to/output_dir"
mkdir -p ${output_dir}
torchrun --rdzv-backend=c10d --rdzv-endpoint=localhost:0 --nnodes=1 --nproc_per_node=${N_GPUS} main_unic.py \
--data_dir=${dataset_dir} \
--output_dir=${output_dir} \
--seed=${RANDOM}
We provide a pretrained UNIC model with the ViT-Base/16 architecture, distilled from the four teachers mentioned above.
Model | Teachers | Distillation Dataset |
Distillation Resolution |
Student Architecture |
ImageNet‑1K Classification |
ADE20K Segmentation |
Model Checkpoint |
Training Arguments |
---|---|---|---|---|---|---|---|---|
UNIC | DINO‑B/16 iBOT‑B/16 DeiT‑III‑B/16 dBOT‑ft‑B/16 |
ImageNet‑1K | 224 | ViT‑Base/16 | 83.8 | 39.6 (Linear head link) |
Link (870MB) |
Link |
The relative performance of UNIC over the four teachers is shown below.
We also provide a pretrained UNIC-L model with the ViT-Large/14 architecture distilled from DINOv2-G/14 and MetaCLIP-H/14 teachers.
Model | Teachers | Distillation Dataset |
Distillation Resolution |
Student Architecture |
ImageNet‑1K k-NN (k=20) |
ImageNet‑1K Zero‑shot |
ADE20K Segmentation |
Model Checkpoint |
Training Arguments |
---|---|---|---|---|---|---|---|---|---|
UNIC‑L | DINOv2‑G/14 MetaCLIP‑H/14 |
ImageNet‑1K | 224/336 | ViT‑Large/14 | 85.6 | 81.4 | 48.3 (Linear head link) |
Link (2.2GB) |
Link |
Comparison of UNIC-L to the teachers and recent AM-RADIO model is shown below.
The evaluation protocol for transfer learning tasks involves two steps:
- Extracting features from the encoder of a pretrained UNIC model
- Training logistic regression classifiers on top of the extracted features
We use the implementation from t-ReX, which is available at https://github.com/naver/trex. For convenience, the evaluation code is copied in the eval_transfer folder of this repository.
First, download the transfer datasets following the instructions in t-ReX repository. Once download finishes, update the hardcoded dataset paths in the eval_transfer/data/init.py file. Then, use the following command to evaluate a pretrained UNIC model on, e.g. the ImageNet-1K dataset (with labels):
source scripts/setup_env.sh
##########
# extract features
dataset="in1k"
image_size=224
pretrained="/path/to/unic/checkpoint.pth"
features_dir=$(dirname "${pretrained}")
features_dir=${features_dir}/transfer/features_${dataset}_${image_size}
if [ ! -f "${features_dir}/features_trainval.pth" ] || [ ! -f "${features_dir}/features_test.pth" ]; then
echo "Extracting features..."
python eval_transfer/main_ft_extract.py \
--output_dir="${features_dir}" \
--pretrained="${pretrained}" \
--dataset="${dataset}" \
--image_size="${image_size}"
fi
##########
# train logreg classifier using extracted features
features_norm="none"
clf_type="logreg_sklearn"
if [[ "${dataset}" == "in1k" ]] || [[ "${dataset}" == cog_* ]] || [[ "${dataset}" == inat* ]]; then
# for large datasets,
# we use SGD implemented in PyTorch and l2 normalize features
features_norm="l2"
clf_type="logreg_torch"
fi
echo ""
echo "Training classifier ..."
python -m sklearnex eval_transfer/main_clf.py --features_dir="${features_dir}" --features_norm=${features_norm} --clf_type=${clf_type}
See the --dataset
argument in main_ft_extact.py for the list of available datasets.
First, download the ADE20K dataset from the official website.
We follow the evaluation protocol from DINOv2, which requires some extra packages like mmcv with specific versions. You can install them using the commands below:
pip install openmim
mim install "mmcv-full==1.7.2"
mim install "mmengine==0.10.1"
pip install "mmsegmentation==0.30.0"
pip install ftfy
If you encounter any mismatch between package versions, we recommend creating a new conda environment as mentioned in the DINOv2 repository.
Then, use the following command to evaluate a pretrained UNIC model on the ADE20K semantic segmentation task (default hyper-parameters are set for 1 GPU):
source ./scripts/setup_env.sh
data_dir=/path/to/ADEChallengeData2016
pretrained="/path/to/unic/checkpoint.pth"
python eval_dense/eval_semseg.py --data_dir=${data_dir} --pretrained=${pretrained}
If you find this repository useful, please consider citing us:
@inproceedings{sariyildiz2024unic,
title={{UNIC}: Universal Classification Models via Multi-teacher Distillation},
author={Sariyildiz, Mert Bulent and Weinzaepfel, Philippe and Lucas, Thomas and Larlus, Diane and Kalantidis, Yannis},
booktitle={European Conference on Computer Vision (ECCV)},
year={2024},
}