A U-Net implementation for Building Segmentation on Ikonos-2 Satellite Images

(First version -- Will be pursued further if found useful)

Project based on the original U-Net paper by Olaf Ronneberger, Philipp Fischer and Thomas Brox (2015)

Motivation: Data availability & exploration of model capabilities -- Curiosity. The possibility of developing a quick and consistent building analysis tool for legacy Ikonos-2 archives.

1. Data

Ikonos-2 Multispectral images are consisted of a Blue, Green, Red, and Near-Infrared channel. Ikonos-2 images come at a Spatial Resolution of 0.8 meters and a Radiometric Resolution of 11 bits.
Initial training phase includes samples from 10 sub-areas of an image taken at the greater Thessaloniki Region, Greece, taken in Spring. This phase aims to give initial performance evaluations and generalization capabilities on images of different distributions (e.g. acquired in other seasons), before the dataset distribution can be expanded.

Training areas Image by © DigitalGlobe, processed
Sample areas were delineated in QGIS and samples were collected similarly from industrial and urban environments. Further samples were taken from irregular background areas. Extracted rasters were processed further into normalized tiles, separated in positive and negative samples and stored in hdf5 format. About 1/6 of each sub-area was kept for validation.
Data was normalized to an [0. 1] interval prior to storage, divided by 2**11.

The images were purchased and provided by the Aristotle University of Thessaloniki.

2. Training Environment

Training mainly followed the recommendations of Ronneberger et al. (2015), without applying additional weights to edge pixels as suggested in the paper. Additional training ideas and methods, such as class balancing, were adopted from Deep Learning with PyTorch by Eli Stevens, Luca Antiga and Thomas Viehmann (2020).

Optimizer
Adam was used with a high momentum (beta1), as recommended in Ronneberger et al. 2015. Beta2 was kept at its default value.
Class Balancing
A tile size of 256 * 256 was chosen, since it was found to produce cleaner samples and allowed for a better separation of tiles into negative (label 0) and positive (label 1).
Data Augmentation
Augmentation includes affine transformations (Translation, Rotation, Scaling and Shear), noise, brightness and contrast adjustment, as well as elastic deformation. Elastic deformation was implemented according to Microsoft paper Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis.
1. Translations were implemented randomly up to 20% in both x and y axes.
2. Rotation was unrestricted up to 360 degrees.
3. Scaling was performed within 75-150% of the original scale.
4. Shear was applied randomly at a range of 70 degrees, using a single angular parameter in torchvision.transforms.functional.affine().
5. Pixel Noise was applied in a normal distribution of Standard Deviation 0.02.
6. Atmospheric Noise was applied from a mask of size 32 x 32 and Std 0.5, upsampled to tile dimensions. This augmentation attempts to simulate the effects of haze and absorption.
7. Contrast adjustment was found to be particularly valuable in training for this particular task. Contrast was randomly adjusted between 70% and 150% of the original image, using a customised torchvision method to support 4-channel images. The images were re-normalized to [0, 1] post adjustment.
8. Brightness adjustments were applied within 80-120% of the original image brightness. Excessive pixel values were clipped to [0, 1] post adjustment.
9. Elastic deformations proved to be as helpful in training as claimed in the U-Net paper. The gaussian kernel used in the deformations appears to have to be the same size as the kernel used for the convolutional neural networks, for optimal results.
Regularization
- Weight Decay: L2 regularization was applied to the first two convolutional layers, due to excessive growth of single filters. This is assumed to be occuring due to the NIR input channel, which can be exploited to explain the majority of negative samples.
- Dropout Layers Two dropout layers were applied in the last downsampling block as suggested in literature. However, despite the extent of augmentations the model kept overfitting, so additional layers were added in each residual conjunction with a droprate of 30%.
  
  --
  Still experimenting from time to time, might not be in line with current model.
  --
Latest Successful Architecture+Tuning

Source: Ronneberger et al. (2015) -- Edited by author

Training Environment Usage

Incase anyone is interested in training further


model_training.py:
usage: Model Training [-h] [--epochs EPOCHS] [--batch-size BATCH_SIZE] [--num-workers NUM_WORKERS] [--lr LR] [--report] [--monitor] [--l2 L2 [L2 ...]] [--reload]
[--init-scale INIT_SCALE] [--checkpoint CHECKPOINT] [--balance-ratio BALANCE_RATIO] [--report-rate REPORT_RATE] [--dropouts DROPOUTS [DROPOUTS ...]]
[--weights WEIGHTS [WEIGHTS ...]] [--check-rate CHECK_RATE]
Training
optional arguments:
-h, --help            show this help message and exit
--epochs EPOCHS       Number of epochs for training
--batch-size BATCH_SIZE
Batch size for training
--num-workers NUM_WORKERS
Number of background processes for data loading
--lr LR               Learning rate
--report, -r          Store losses on memory and produce a report graph -- Contrained by memory size. Control with REPORT_RATE to minimize logs accordingly
--monitor, -m         Observe activations and predictions of a sample
--l2 L2 [L2 ...]      L2 Regularization parameters. Sequence of length 23.
--reload              Load checkpoint and continue training
--init-scale INIT_SCALE, -i INIT_SCALE
The factor to initially multiply input channels with: in_channels*INIT_SCALE = out_channels -- Controls overall U-net feature length
--checkpoint CHECKPOINT, -c CHECKPOINT
Path to saved checkpoint
--balance-ratio BALANCE_RATIO, -b BALANCE_RATIO
For positive values roughly every n-th sample is negative, the rest are positive. The opposite for negative values.
--report-rate REPORT_RATE
Epoch frequency to log losses for reporting. Default: EPOCHS // 10
--dropouts DROPOUTS [DROPOUTS ...], -d DROPOUTS [DROPOUTS ...]
Sequence of length 23. Dropout probabilities for each CNN.
--weights WEIGHTS [WEIGHTS ...], -w WEIGHTS [WEIGHTS ...]
Class weights for loss computation. Sequence of length 2
--check-rate CHECK_RATE
Write checkpoint every n epochs - For Monitor/Checkpoint options. Default: EPOCHS // 10

3. Training Results

Latest Successful Experiment -- Model training visualization

4. Test Results

Same Distribution (Neighboring spring scene)

Testing samples were drawn from a neighboring scene to the training distribution, taken the same day.

Test areas, Image: © DigitalGlobe, processed
- Urban:
- Industrial:
- Mostly Background -- Elevated Areas:
Different Distribution (Further away winter scene)

# TODO

5. Conclusions / Key Findings

Cement rooftops, which are a majority in the background test sample group, appear to be under represented in the training set. However, because I have classified this area before in my thesis and I know it well, the error distribution looks very similar to the results I had gotten using OBIA with a SVM classifier. This, together with the fact that affine transformations / image flips and other spatial augmentations do not seem to have any effect in training leads me to believe that there's a problem with the architecture and the model is mostly working with colors, rather than spatial patterns. Which is not what this architecture is supposed to do, as it was proven to work remarkably well with 1-channel images. This probably happens because the input channels are intermixed immediately (similar to regular pixel based classification) and the subsequent features are developed based on that.

Solution to this problem could be the isolation of each input channel and the parallel development of features per channel during downsampling, while merging them during upsampling. This however would probably result in duplicate work and an unnecessarily large model, since you would have to produce 4 times the feature maps.

A better solution would be to add an image synthesizer 1x1 conv layer near the input and let the network combine the channels linearly to its preference, into a single one-channel image, before feeding it to the rest of the network.

To be addressed in version 2.

# TODO

Additionally, it would be ideal to label each rooftop according to its type. That would allow for much more elaborate error analysis, but it is something I'm not eager doing right now for an experimental personal project. Potentially to be addressed in the future.

Developer/Author:

Iosif Doundoulakis
M.Eng. Spatial Planning & Development
iosif.doundoulakis@outlook.com

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
Evaluation		Evaluation
Models		Models
Reports		Reports
Training		Training
imgdir		imgdir
.gitattributes		.gitattributes
CLI_parser.py		CLI_parser.py
LICENSE		LICENSE
README.md		README.md
data_loader.py		data_loader.py
evaluation.py		evaluation.py
model_architecture.py		model_architecture.py
model_training.py		model_training.py
test_classes.py		test_classes.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A U-Net implementation for Building Segmentation on Ikonos-2 Satellite Images

(First version -- Will be pursued further if found useful)

1. Data

2. Training Environment

Optimizer

Class Balancing

Data Augmentation

Regularization

Latest Successful Architecture+Tuning

Training Environment Usage

3. Training Results

4. Test Results

Same Distribution (Neighboring spring scene)

Different Distribution (Further away winter scene)

5. Conclusions / Key Findings

Developer/Author:

About

Releases

Packages

Languages

License

JosephDoun/Ikonos-2-Building-Segmentation-U-Net

Folders and files

Latest commit

History

Repository files navigation

A U-Net implementation for Building Segmentation on Ikonos-2 Satellite Images

(First version -- Will be pursued further if found useful)

1. Data

2. Training Environment

Optimizer

Class Balancing

Data Augmentation

Regularization

Latest Successful Architecture+Tuning

Training Environment Usage

3. Training Results

4. Test Results

Same Distribution (Neighboring spring scene)

Different Distribution (Further away winter scene)

5. Conclusions / Key Findings

Developer/Author:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages