This repository is for training DINOv2 for downstream tasks.
It is not for self-supervised learning.
- Data Pararell
- Class Balanced Loss
- Rare Class Sampling
- Select optimizer
- Freeze/Unfreeze backbone
Install the required packages using requirements.txt
.
Because of xformer, it requires the latest version of pytorch. However you can use different version of xformer and pytorch.
pip install -r requirements.txt
The script requires a dataset formatted as below.
Data
├── ...
├── Class4
│ ├── Img1.png
│ ├── Img2.png
│ ├── ...
├── Class5
│ ├── Img1.png
│ ├── Img2.png
│ ├── ...
├── ...
Data preprocessing: Please run the following scripts to generate the class_stats.json
.
python tools/preprocess.py /path/to/yout/dataset
You can launch the training code by using:
bash train.sh
You can set your training arguments at config.py.
There is a setting for Rare Class Sampling(RCS). It is a setting for long-talied classification motivated from DAFormer.
This will sample the rare class more often during the iteration. However it has a risk of model to not see some classes.
It is more suitable for multi-class classifiaction.
Training arguments
batch_per_gpu
(int): Number of samples per GPU in each forward step (default: 16).num_gpu
(int): Number of GPUs used for training (default: 1).resize
(tuple): The size to which input images are resized (default: (224, 224)).mean
(list): Mean normalization values for each channel in RGB format (default: [0.485, 0.456, 0.406]).std
(list): Standard deviation normalization values for each channel in RGB format (default: [0.229, 0.224, 0.225]).optimizer
(dict): Optimizer settings.type
: Optimizer type (default: 'SGD').params
: Additional optimizer parameters, such as momentum (default: 0.9).learning_rate
: Learning rates for different parts of the model.head_lr
: Learning rate for the head (default: 1e-3).backbone_lr
: Learning rate for the backbone (default: 1e-6).
scheduler
(dict): Learning rate scheduler settings.type
: Scheduler type (default: 'linear').params
: Additional scheduler parameters like warmup ratio (default: 0.03).
do_eval
(bool): Whether to perform evaluation during training (default: False).num_train_epoch
(int): Number of epochs for training (default: 100).model
(dict): Model architecture settings.backbone
: Backbone model type (default: 'dinov2_l').head
: Classification head type (default: 'single').num_classes
: Number of output classes (default: 3).freeze_backbone
: Whether to freeze the backbone during training (default: False).
loss
(dict): Loss function settings.loss_type
: Type of loss function (default: 'CE_loss').beta
: Beta parameter for class-balanced loss (default: None).gamma
: Gamma parameter for focal loss (default: None).
dataset
(dict): Dataset paths.train
: Training dataset settings.data_root
: Root directory of the training dataset.
eval
: Evaluation dataset settings.data_root
: Root directory of the evaluation dataset.
max_checkpoint
(int): Maximum number of checkpoints to keep (default: 1).
Note: The backbone learning rate is often set to be much smaller than the head learning rate to prevent overfitting the pretrained layers.
You can evaluate your model by using:
bash eval.sh
The evaluation will calculate the top-k accuracy together.
- Multi-label classification
- Segmentation
This project is licensed under the Apache-2.0 License. See the LICENSE file for details.
If you find this repository useful in your project, please consider giving a ⭐ and citing:
@misc{Dino-v2-Finetuning,
author = {Yuwon Lee},
title = {Dino-V2-Finetune},
year = {2024},
publisher = {GitHub},
url = {https://github.com/2U1/DINOv2-Finetune}
}
This project is based on