Skip to content

Implementation dino v2 for remote sensing with huggingface transformers

Notifications You must be signed in to change notification settings

chagmgang/dinov2-remote-sensing

Repository files navigation

Reimplementation Self-Supervised Vision Transformers for DINO v2 with Huggingface 🤗


  • Pytorch implementation and pretrained models for DINO v2 in remote sensing.
  • See Official Paper and Github for information in detail. [arXiv #1] [arXiv #2] [Github]

Training

This project use the deepspeed interface for multi gpu training

deepspeed --include localhost:0,1,2,3... vit_train.py
deepspeed --include localhost:0,1,2,3... convvit_train.py

Training Dataset for Remote Sensing

Dataset name # of corpus Dataset Paper
Million-AID 990,666 Link
SkyScript 5,181,068 Link
Total 6,171,734

Pretrained Model on Huggingface

Model Epoch Total Params Student Backbone Params Student DINO Head Params Student iBOT Head Params Weight & Config Logs
ViT-S/16-e25 25 132M 21M 22M 22M Link logs
ViT-S/16-e100 25
ViT-B/16-e25 25 264M 88M 21M 21M Link logs
ConvViT-S-e25(DINOv1) 25 88.5M 22.2M 22M x Link logs

Evaluation

The evaluation methods for DINOv2 are k-nn clustering and linear probing. 90% of the data is randomly selected as the training set while the 10% is selected as test set. The k=20 is selected for evaluation with K-NN. The evaluation datasets are including below table. The splited data is stored in linprob_data_lists.

Dataset Name Dataset Paper
RESISC Remote Sensing Image Scene Classification: Benchmark and State of the Art
Optimal 31 Scene Classification With Recurrent Attention of VHR Remote Sensing Images
MLRSNet MLRSNet: A Multi-label High Spatial Resolution Remote Sensing Dataset for Semantic Scene Understanding
WHU-RS19
EuroSAT EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
UC Merced Bag-of-visual-words and spatial extensions for land-use classification
Cv-BrCT AiRound and CV-BrCT: Novel Multi-View Datasets for Scene Classification
AiRound AiRound and CV-BrCT: Novel Multi-View Datasets for Scene Classification
RSI-CB128 RSI-CB: A Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data

Linear Probing Evaluation

# train_textfile = linprob_data_lists/RESISC/train.txt
# test_textfile = linprob_data_lists/RESISC/test.txt

python3 evaluation/linprob.py --model-path {model_registry} \
                              --data-root {data_root} \
                              --train-text {train_textfile} \
                              --test-text {test_textfile}
Model RESISC Optimal 31 MLRSNet WHU-RS19 EuroSAT UC Merced Cv-BrCT AiRound RSI-CB128
ViT-S/16-e25 94.381 96.237 96.642 99.811 98.037 99.048 77.613 78.644 99.593
ViT-B/16-e25 95.460 98.925 97.301 100.00 97.889 98.571 79.058 80.339 99.675
ConvViT-S-e25(DINOv1) 94.476 93.548 95.919 99.065 96.778 98.095 77.695 81.949 99.295

KNN Evaluation

# train_textfile = linprob_data_lists/RESISC/train.txt
# test_textfile = linprob_data_lists/RESISC/test.txt

python3 evaluation/knn.py --model-path {model_registry} \
                              --data-root {data_root} \
                              --train-text {train_textfile} \
                              --test-text {test_textfile}
Model RESISC Optimal 31 MLRSNet WHU-RS19 EuroSAT UC Merced Cv-BrCT AiRound RSI-CB128
ViT-S/16-e25 93.365 89.785 96.981 97.196 95.741 87.143 76.208 77.881 98.943
ViT-B/16-e25 94.286 90.323 97.328 100.00 95.704 87.143 76.456 77.373 99.106
ConvViT-S-e25(DINOv1) 92.508 91.935 95.947 98.131 94.074 90.000 75.630 76.271 98.374

Property Analysis

About

Implementation dino v2 for remote sensing with huggingface transformers

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published