Skip to content

Commit 5d08efe

Browse files
authored
Merge pull request #77 from bkhanal-11/master
added files for unet
2 parents 50dafea + 105c9cb commit 5d08efe

File tree

8 files changed

+116
-0
lines changed

8 files changed

+116
-0
lines changed

_data/authors.yml

+9
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,15 @@ ashuta:
2222
icon: "fab fa-fw fa-github"
2323
url: "https://github.com/ashuta03"
2424

25+
bishwash:
26+
name : "Bishwash Khanal"
27+
bio : ""
28+
avatar : "/assets/images/userPhoto/no_photo.jpg"
29+
links:
30+
- label: "Github"
31+
icon: "fab fa-fw fa-github"
32+
url: "https://github.com/bkhanal-11"
33+
2534
sajita:
2635
name: "Sajita"
2736
bio: ""
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
---
2+
title: "Semantic Segmentation using U-Net"
3+
excerpt_separator: "<!--more-->"
4+
last_modified_at: 2022-10-11T14:36:02-05:00
5+
categories:
6+
- Applied Machine Learning
7+
tags:
8+
- U-Net
9+
- Semantic Segmentation
10+
- Dice Loss
11+
header:
12+
image: /assets/images/unet/semantic.jpeg
13+
image_description: "A description of the image"
14+
author: bishwash
15+
---
16+
17+
## U-Net
18+
19+
U-Net is a u-shaped encoder-decoder network architecture, which consists of four encoder blocks
20+
and four decoder blocks that are connected via bridge. It is one of the most popularly used approaches
21+
in any semantic segmentation task. It was originally introduced by Olaf Ronneberger through the publication
22+
"U-Net: Convolutional Networks for Biomedical Image Segmentation". It is a fully convolutional neural network
23+
that is designed to learn from fewer training samples.
24+
25+
<p align="center">
26+
<img src="{{ site.url }}{{ site.baseurl }}/assets/images/unet/unet-arc.png" width="600" />
27+
28+
</p>
29+
30+
It has three main componets namely encoder network, decoder network and skip connections. The encoder network
31+
(contracting path) halfs the spatial dimensions and doubles the number of feature channels at each encoder block
32+
while the decoder network doubles the spatial dimensions and halfs the number of of feature channels. The skip
33+
connections connect output of encoder block with corresponding input of decoder block.
34+
35+
36+
### Encoder Network
37+
38+
Encoder Network acts as the feature extractor and learn an abstract representation of the input image through
39+
a sequence of the encoder blocks. Each encoder block consists of 3x3 convolutions where each convolution is followed by
40+
a ReLU (Rectified Linear Unit) activation function. The ReLU function introduces non-linearity into the network, which
41+
helps in the better generalization of the training data. The output of ReLU acts as a skip connection for the corresponding
42+
decoder block.
43+
44+
Next follows a 2x2 max-pooling, where the spatial dimensions of the feature maps are reduced by half. This reduces the
45+
computational cost by decreasing the number of trainable parameters.
46+
47+
### Skip Connection
48+
49+
These skip connections provide additional information that helps the decoder generate better semantic features. They also act
50+
as a shortcut connection that helps the indirect flow of gradients to the earlier layers without any degradation.
51+
52+
The bridge connects the encoder and decoder network and completes the flow of information.
53+
54+
### Decoder Network
55+
56+
It is used to take the abstract representation and generate a semantic segmentation mask. The decoder block starts with 2x2 transpose
57+
convolution. Next, it is concatenated with the corresponding skip connection feature map from the encoder block. These skip
58+
connections provide features from earlier layers that are sometimes lost due to the depth of the network. The output of the last
59+
decoder passes through 1x1 convolution with sigmoid activation. The sigmoid activation function gives the segmenation mask representing the
60+
pixel-wise classification.
61+
62+
It is prefered to use batch normalization in between the convolution layer and the ReLU activation function. It reduces internal
63+
covariance shift and makes the network more stable while training. Sometimes, the dropout is used after ReLU. It forces the network to learn
64+
different representation by dropping out some randomly selected neurons. It helps the network to become less dependent upon certain neuron.
65+
This in turn helps the network to better generalize and prevent it from overfitting.
66+
67+
68+
## Semantic Segmentation for Self Driving Cars
69+
70+
This dataset provides data images and labeled semantic segmentations captured via CARLA self-driving car simulator. The data was
71+
generated as part of the Lyft Udacity Challenge . This dataset can be used to train ML algorithms to identify semantic segmentation
72+
of cars, roads etc in an image. The data has 5 sets of 1000 images and corresponding labels. There are 23 different labels ranging from
73+
road, roadlines, sidewalk to building, pedestrians, fences.
74+
75+
<p align="center">
76+
<img src="{{ site.url }}{{ site.baseurl }}/assets/images/unet/example.png" width="600"/>
77+
</p>
78+
79+
For our training, we select the first 13 labels. Then all the images were resized to 256x256. The train-validation-test split was
80+
0.6, 0.2 and 0.2. Learning rate was chosen to be 0.001 for Adam optimizer. The performance of the network was optimized with the help
81+
of Dice Loss which is defined as
82+
83+
<p align="center">
84+
<img src="{{ site.url }}{{ site.baseurl }}/assets/images/unet/diceloss.png" width="350"/>
85+
</p>
86+
87+
The performance of network for for first 25 epoch out of 100 is as follow.
88+
89+
<p align="center">
90+
<img src="{{ site.url }}{{ site.baseurl }}/assets/images/unet/loss.png" width="450"/>
91+
</p>
92+
93+
Some predicted results for buildings with their ground truth is as follow.
94+
95+
<p align="center">
96+
<img src="{{ site.url }}{{ site.baseurl }}/assets/images/unet/results.png" width="600"/>
97+
</p>
98+
99+
#### References
100+
101+
[1] [U-Net: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/pdf/1505.04597.pdf)
102+
103+
[2] [What is UNET?](https://medium.com/analytics-vidhya/what-is-unet-157314c87634)
104+
105+
[3] [Semantic Segmentation for Self Driving Cars](https://www.kaggle.com/datasets/kumaresanmanickavelu/lyft-udacity-challenge)
106+
107+
[4] [Semantic segmentation of aerial imagery](https://www.kaggle.com/datasets/humansintheloop/semantic-segmentation-of-aerial-imagery)

assets/images/unet/diceloss.png

10.6 KB
Loading

assets/images/unet/example.png

195 KB
Loading

assets/images/unet/loss.png

42.2 KB
Loading

assets/images/unet/results.png

401 KB
Loading

assets/images/unet/semantic.jpeg

16.6 KB
Loading

assets/images/unet/unet-arc.png

108 KB
Loading

0 commit comments

Comments
 (0)