|
| 1 | +--- |
| 2 | +title: "Semantic Segmentation using U-Net" |
| 3 | +excerpt_separator: "<!--more-->" |
| 4 | +last_modified_at: 2022-10-11T14:36:02-05:00 |
| 5 | +categories: |
| 6 | + - Applied Machine Learning |
| 7 | +tags: |
| 8 | +- U-Net |
| 9 | +- Semantic Segmentation |
| 10 | +- Dice Loss |
| 11 | +header: |
| 12 | + image: /assets/images/unet/semantic.jpeg |
| 13 | + image_description: "A description of the image" |
| 14 | +author: bishwash |
| 15 | +--- |
| 16 | + |
| 17 | +## U-Net |
| 18 | + |
| 19 | +U-Net is a u-shaped encoder-decoder network architecture, which consists of four encoder blocks |
| 20 | +and four decoder blocks that are connected via bridge. It is one of the most popularly used approaches |
| 21 | +in any semantic segmentation task. It was originally introduced by Olaf Ronneberger through the publication |
| 22 | +"U-Net: Convolutional Networks for Biomedical Image Segmentation". It is a fully convolutional neural network |
| 23 | +that is designed to learn from fewer training samples. |
| 24 | + |
| 25 | +<p align="center"> |
| 26 | + <img src="{{ site.url }}{{ site.baseurl }}/assets/images/unet/unet-arc.png" width="600" /> |
| 27 | + |
| 28 | +</p> |
| 29 | + |
| 30 | +It has three main componets namely encoder network, decoder network and skip connections. The encoder network |
| 31 | +(contracting path) halfs the spatial dimensions and doubles the number of feature channels at each encoder block |
| 32 | +while the decoder network doubles the spatial dimensions and halfs the number of of feature channels. The skip |
| 33 | +connections connect output of encoder block with corresponding input of decoder block. |
| 34 | + |
| 35 | + |
| 36 | +### Encoder Network |
| 37 | + |
| 38 | +Encoder Network acts as the feature extractor and learn an abstract representation of the input image through |
| 39 | +a sequence of the encoder blocks. Each encoder block consists of 3x3 convolutions where each convolution is followed by |
| 40 | +a ReLU (Rectified Linear Unit) activation function. The ReLU function introduces non-linearity into the network, which |
| 41 | +helps in the better generalization of the training data. The output of ReLU acts as a skip connection for the corresponding |
| 42 | +decoder block. |
| 43 | + |
| 44 | +Next follows a 2x2 max-pooling, where the spatial dimensions of the feature maps are reduced by half. This reduces the |
| 45 | +computational cost by decreasing the number of trainable parameters. |
| 46 | + |
| 47 | +### Skip Connection |
| 48 | + |
| 49 | +These skip connections provide additional information that helps the decoder generate better semantic features. They also act |
| 50 | +as a shortcut connection that helps the indirect flow of gradients to the earlier layers without any degradation. |
| 51 | + |
| 52 | +The bridge connects the encoder and decoder network and completes the flow of information. |
| 53 | + |
| 54 | +### Decoder Network |
| 55 | + |
| 56 | +It is used to take the abstract representation and generate a semantic segmentation mask. The decoder block starts with 2x2 transpose |
| 57 | +convolution. Next, it is concatenated with the corresponding skip connection feature map from the encoder block. These skip |
| 58 | +connections provide features from earlier layers that are sometimes lost due to the depth of the network. The output of the last |
| 59 | +decoder passes through 1x1 convolution with sigmoid activation. The sigmoid activation function gives the segmenation mask representing the |
| 60 | +pixel-wise classification. |
| 61 | + |
| 62 | +It is prefered to use batch normalization in between the convolution layer and the ReLU activation function. It reduces internal |
| 63 | +covariance shift and makes the network more stable while training. Sometimes, the dropout is used after ReLU. It forces the network to learn |
| 64 | +different representation by dropping out some randomly selected neurons. It helps the network to become less dependent upon certain neuron. |
| 65 | +This in turn helps the network to better generalize and prevent it from overfitting. |
| 66 | + |
| 67 | + |
| 68 | +## Semantic Segmentation for Self Driving Cars |
| 69 | + |
| 70 | +This dataset provides data images and labeled semantic segmentations captured via CARLA self-driving car simulator. The data was |
| 71 | +generated as part of the Lyft Udacity Challenge . This dataset can be used to train ML algorithms to identify semantic segmentation |
| 72 | +of cars, roads etc in an image. The data has 5 sets of 1000 images and corresponding labels. There are 23 different labels ranging from |
| 73 | +road, roadlines, sidewalk to building, pedestrians, fences. |
| 74 | + |
| 75 | +<p align="center"> |
| 76 | + <img src="{{ site.url }}{{ site.baseurl }}/assets/images/unet/example.png" width="600"/> |
| 77 | +</p> |
| 78 | + |
| 79 | +For our training, we select the first 13 labels. Then all the images were resized to 256x256. The train-validation-test split was |
| 80 | +0.6, 0.2 and 0.2. Learning rate was chosen to be 0.001 for Adam optimizer. The performance of the network was optimized with the help |
| 81 | +of Dice Loss which is defined as |
| 82 | + |
| 83 | +<p align="center"> |
| 84 | + <img src="{{ site.url }}{{ site.baseurl }}/assets/images/unet/diceloss.png" width="350"/> |
| 85 | +</p> |
| 86 | + |
| 87 | +The performance of network for for first 25 epoch out of 100 is as follow. |
| 88 | + |
| 89 | +<p align="center"> |
| 90 | + <img src="{{ site.url }}{{ site.baseurl }}/assets/images/unet/loss.png" width="450"/> |
| 91 | +</p> |
| 92 | + |
| 93 | +Some predicted results for buildings with their ground truth is as follow. |
| 94 | + |
| 95 | +<p align="center"> |
| 96 | + <img src="{{ site.url }}{{ site.baseurl }}/assets/images/unet/results.png" width="600"/> |
| 97 | +</p> |
| 98 | + |
| 99 | +#### References |
| 100 | + |
| 101 | +[1] [U-Net: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/pdf/1505.04597.pdf) |
| 102 | + |
| 103 | +[2] [What is UNET?](https://medium.com/analytics-vidhya/what-is-unet-157314c87634) |
| 104 | + |
| 105 | +[3] [Semantic Segmentation for Self Driving Cars](https://www.kaggle.com/datasets/kumaresanmanickavelu/lyft-udacity-challenge) |
| 106 | + |
| 107 | +[4] [Semantic segmentation of aerial imagery](https://www.kaggle.com/datasets/humansintheloop/semantic-segmentation-of-aerial-imagery) |
0 commit comments