Official PyTorch implementation of:
A Two-Stream Symmetric Network with Bidirectional Ensemble for Aerial Image Matching J.-H. Park, W.-J Nam and S.-W Lee Remote Sensing, 2020, Vol. 12, No. 6, pp. 465 [Paper] [arXiv]
2025-12-31 (DeepAerialNet v2.0)
Feature Extraction
- ViT backbone: Added
vit-l/16(ViT-Large/16 with DINOv3 sat493m pretrained weights) as a new backbone option
Feature Correlation
- Dual softmax correlation: Added dual softmax scaling to dot product correlation for improved feature matching (
v2only)
Feature Regression
- GroupNorm: Replaced BatchNorm with GroupNorm(g=8) for stable small-batch training (
v2only)
Training
- Muon optimizer: Added Muon optimizer for efficient transformer training (
v2only)
2025-12-26
- Migrated pretrained models and datasets to Hugging Face Hub
- Replaced
pretrainedmodelswithtimmlibrary for modern backbone support - Added
uvsupport for fast environment setup
# Quick install with uv (recommended)
./install.sh
# Or manual installation
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv && source .venv/bin/activate
uv pip install -e ".[download]"Models are hosted on Hugging Face Hub.
| Backbone | Version | PCK@0.05 | PCK@0.03 | PCK@0.01 | Download |
|---|---|---|---|---|---|
| ResNet101 | v1 | 93.8% | 82.5% | 35.1% | checkpoint_resnet101.pt |
| ResNeXt101 | v1 | 94.6% | 85.9% | 43.2% | checkpoint_resnext101.pt |
| DenseNet169 | v1 | 95.6% | 88.4% | 44.0% | checkpoint_densenet169.pt |
| SE-ResNeXt101 | v1 | 97.1% | 91.1% | 48.0% | checkpoint_seresnext101.pt |
| ViT-L/16 | v2 | 99.0% | 96.2% | 65.7% | checkpoint_vit-l16.pt |
Download via Python:
from src.data.download import download_model
# Download specific model
download_model("vit-l/16") # ViT-L/16 (Best)
download_model("se_resnext101") # SE-ResNeXt101
download_model("resnet101") # ResNet101
# Or use huggingface_hub directly
from huggingface_hub import hf_hub_download
hf_hub_download(
repo_id="jaehyunnn/DeepAerialMatching",
filename="checkpoints/checkpoint_vit-l16.pt",
local_dir="."
)Or via CLI:
python src/data/download.py --model vit-l/16 # Download ViT-L/16
python src/data/download.py --model se_resnext101 # Download SE-ResNeXt101Datasets are also hosted on Hugging Face Hub.
| Dataset | Description | Download |
|---|---|---|
| evaluation_data | Evaluation benchmark (500 pairs) | evaluation_data.tar.gz |
| training_data | Training data (18K pairs with CSV) | training_data.tar.gz |
Note: The ViT-L/16 model was trained with an additional 27K private aerial image pairs on top of the public training dataset.
Download via Python:
from src.data.download import download_eval, download_train
download_eval() # Download evaluation dataset
download_train() # Download training datasetOr via CLI:
python src/data/download.py --eval # Evaluation only
python src/data/download.py --train # Training only
python src/data/download.py # All datasets# Demo
./scripts/demo.sh
./scripts/demo.sh --backbone resnet101 --model checkpoints/checkpoint_resnet101.pt
# Evaluation
./scripts/eval.sh
./scripts/eval.sh --backbone resnet101 --model checkpoints/checkpoint_resnet101.pt
# Training
./scripts/train.sh
./scripts/train.sh --backbone resnet101 --num-epochs 50
./scripts/train.sh --backbone vit-l/16 --freeze-backbone
# Generate transformation labels
python src/cli/generate_labels.py generate -n 10000 -t affine -o outputs/theta.csv
python src/cli/generate_labels.py generate -n 10000 -t homography -o outputs/homo.csv| Option | Values | Description |
|---|---|---|
--backbone |
resnet101, resnext101, se_resnext101, densenet169, vit-l/16 |
Feature extraction backbone |
--freeze-backbone |
flag | Freeze backbone weights during training |
Generate random transformation labels for training data:
# Generate affine transformations (default: 10000 samples)
python src/cli/generate_labels.py generate -n 10000 -t affine -o outputs/affine.csv
# Generate homography transformations
python src/cli/generate_labels.py generate -n 10000 -t homography -o outputs/homo.csv
# Custom parameters
python src/cli/generate_labels.py generate \
-n 5000 \
-t affine \
--max-translation 0.3 \
--max-scale 0.4 \
--max-rotation 0.25 \
-o outputs/custom.csv
# Add rotation to existing transformations
python src/cli/generate_labels.py add-rotation \
-i outputs/affine.csv \
--max-rotation 0.5 \
-o outputs/rotated.csv
# Extract image pairs from directories
python src/cli/generate_labels.py extract-pairs \
-s datasets/source \
-t datasets/target \
-o outputs/pairs.csv| Option | Default | Description |
|---|---|---|
-n, --num-samples |
10000 | Number of samples to generate |
-t, --type |
affine | Transformation type (affine or homography) |
--max-translation |
0.5 | Maximum translation ratio |
--max-scale |
0.5 | Maximum scale ratio |
--max-rotation |
0.5 | Maximum rotation (fraction of π) |
--max-perspective |
0.1 | Maximum perspective distortion (homography only) |
src/models/
├── backbone.py # FeatureExtraction (CNN/ViT backbones)
├── correlation.py # FeatureCorrelation
├── layers.py # FeatureL2Norm, FeatureRegression
├── aerial_net.py # AerialNetSingleStream, AerialNetTwoStream
└── loss.py # TransformedGridLoss
- Python >= 3.9
- PyTorch >= 2.5
- timm >= 1.0
@article{park2020aerial,
title={A Two-Stream Symmetric Network with Bidirectional Ensemble for Aerial Image Matching},
author={Park, Jae-Hyun and Nam, Woo-Jeoung and Lee, Seong-Whan},
journal={Remote Sensing},
volume={12},
number={3},
pages={465},
year={2020},
publisher={MDPI}
}