Reimplementation Self-Supervised Vision Transformers for DINO v2 with Huggingface 🤗

Pytorch implementation and pretrained models for DINO v2 in remote sensing.
See Official Paper and Github for information in detail. [arXiv #1] [arXiv #2] [Github]

Training

This project use the deepspeed interface for multi gpu training

deepspeed --include localhost:0,1,2,3... vit_train.py

deepspeed --include localhost:0,1,2,3... convvit_train.py

Training Dataset for Remote Sensing

Dataset name	# of corpus	Dataset Paper
Million-AID	990,666	Link
SkyScript	5,181,068	Link
Total	6,171,734

Pretrained Model on Huggingface

Model	Epoch	Total Params	Student Backbone Params	Student DINO Head Params	Student iBOT Head Params	Weight & Config	Logs
ViT-S/16-e25	25	132M	21M	22M	22M	Link	logs
ViT-S/16-e100	25	132M	21M	22M	22M	Link	logs
ViT-B/16-e25	25	264M	88M	21M	21M	Link	logs
ViT-L/14-e25	25	837M	303M	57M	57M	Link	logs
ViT-L/14-e50	50	837M	303M	57M	57M	Link	logs
ConvViT-S-e25(DINOv1)	25	88.5M	22.2M	22M	x	Link	logs

Evaluation

The evaluation methods for DINOv2 are k-nn clustering and linear probing. 90% of the data is randomly selected as the training set while the 10% is selected as test set. The k=20 is selected for evaluation with K-NN. The evaluation datasets are including below table. The splited data is stored in linprob_data_lists.

Dataset Name	Dataset Paper
`RESISC`	Remote Sensing Image Scene Classification: Benchmark and State of the Art
`Optimal 31`	Scene Classification With Recurrent Attention of VHR Remote Sensing Images
`MLRSNet`	MLRSNet: A Multi-label High Spatial Resolution Remote Sensing Dataset for Semantic Scene Understanding
`WHU-RS19`
`EuroSAT`	EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
`UC Merced`	Bag-of-visual-words and spatial extensions for land-use classification
`Cv-BrCT`	AiRound and CV-BrCT: Novel Multi-View Datasets for Scene Classification
`AiRound`	AiRound and CV-BrCT: Novel Multi-View Datasets for Scene Classification
`RSI-CB128`	RSI-CB: A Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data

Linear Probing Evaluation

# train_textfile = linprob_data_lists/RESISC/train.txt
# test_textfile = linprob_data_lists/RESISC/test.txt

python3 evaluation/linprob.py --model-path {model_registry} \
                              --data-root {data_root} \
                              --train-text {train_textfile} \
                              --test-text {test_textfile}

Model	RESISC	Optimal 31	MLRSNet	WHU-RS19	EuroSAT	UC Merced	Cv-BrCT	AiRound	RSI-CB128
ViT-S/16-e25	94.381	96.237	96.642	99.811	98.037	99.048	77.613	78.644	99.593
ViT-S/16-e100	94.381	95.161	96.349	100.00	97.704	99.048	76.910	79.407	99.539
ViT-B/16-e25	95.460	98.925	97.301	100.00	97.889	98.571	79.058	80.339	99.675
ViT-L/14-e25	96.603	96.774	98.161	100.000	98.704	99.048	80.132	82.627	99.729
ViT-L/14-e50	96.762	96.774	97.511	100.00	98.407	98.571	80.463	85.508	99.702
ConvViT-S-e25(DINOv1)	94.476	93.548	95.919	99.065	96.778	98.095	77.695	81.949	99.295

KNN Evaluation

# train_textfile = linprob_data_lists/RESISC/train.txt
# test_textfile = linprob_data_lists/RESISC/test.txt

python3 evaluation/knn.py --model-path {model_registry} \
                              --data-root {data_root} \
                              --train-text {train_textfile} \
                              --test-text {test_textfile}

Model	RESISC	Optimal 31	MLRSNet	WHU-RS19	EuroSAT	UC Merced	Cv-BrCT	AiRound	RSI-CB128
Official ViT-S/14	87.778	85.484	91.820	99.065	92.074	91.429	73.936	74.068	96.504
Official ViT-B/14	90.571	89.247	91.948	96.262	90.667	92.857	74.721	75.847	96.585
ViT-S/16-e25	93.365	89.785	96.981	97.196	95.741	87.143	76.208	77.881	98.943
ViT-S/16-e100	93.746	94.624	97.081	97.196	96.222	86.667	75.960	76.695	98.808
ViT-B/16-e25	94.286	90.323	97.328	100.00	95.704	87.143	76.456	77.373	99.106
ViT-L/14-e25	93.778	91.398	97.392	99.065	96.963	88.095	79.430	80.085	99.133
ViT-L/14-e50	94.063	92.473	97.511	100.00	96.704	87.619	79.843	82.203	99.079
ConvViT-S-e25(DINOv1)	92.508	91.935	95.947	98.131	94.074	90.000	75.630	76.271	98.374

Property Analysis

Feature Mapping - feature_mapping.ipynb
Sparse Feature Matching - vit-feature-matching.ipynb
Image Retrieval - index_search.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
assets		assets
data_lists		data_lists
dinov2		dinov2
evaluation		evaluation
linprob_data_lists		linprob_data_lists
notebook		notebook
.gitignore		.gitignore
README.md		README.md
convvit_train.py		convvit_train.py
requirements.txt		requirements.txt
setup.py		setup.py
vit_train.py		vit_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reimplementation Self-Supervised Vision Transformers for DINO v2 with Huggingface 🤗

Training

Training Dataset for Remote Sensing

Pretrained Model on Huggingface

Evaluation

Linear Probing Evaluation

KNN Evaluation

Property Analysis

About

Releases

Packages

Languages

chagmgang/dinov2-remote-sensing

Folders and files

Latest commit

History

Repository files navigation

Reimplementation Self-Supervised Vision Transformers for DINO v2 with Huggingface 🤗

Training

Training Dataset for Remote Sensing

Pretrained Model on Huggingface

Evaluation

Linear Probing Evaluation

KNN Evaluation

Property Analysis

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages