It is possible to represent the entire process of Super-Resolution as a Deep Convolution Neural Network. The start-of-the-art model for Super-resolution is based on GANs. This repository contains the CNN-based implementaion which is an end to end mapping between low and high-resolution images. It takes as input a 64x64 image and outputs a 128x128 image.
Here Linnaeus 5
dataset, which contains 6000 train images and 2000 test images, has been used. The resolution of all images is 256x256. For this model I have resized images to 64x64(which serve as the input data) and 128x128(which serve as ground truth for the respective images).
The CNN architecture is similar to one described in 'Reconstructing Obfuscated Human Faces'.
Click here to view the model architecture.
In the un-optimized version MeanSquaredError
is used as loss function. This resembles with the Pixel Loss which is given as-
In the optimized version a linear combination of Pixel Loss and Perceptual Loss is used. Perceptual loss gives an estimate of difference between feature map of image between this model and, say, a pre-trained VGGNet. The Perceptual loss is given as-
Here Φ denotes the activation of the 6th layer of a pre-trained VGGNet16 model.To view the architecture of custom VGG model click here
The final loss function looks something like-
Input(64x64) | Ground Truth | Predicted |
---|---|---|
Input(64x64) | Ground Truth | Predicted |
---|---|---|
Input(64x64) | Ground Truth | Predicted |
---|---|---|
Input(64x64) | Ground Truth | Predicted |
---|---|---|
These images quite clearly show that model performs pretty well with it comes to smoothening out curves and edges.
However, it can be seen that the images are blurry and miss intricate details. The can be resolved by adding the Perceptual Loss
to the Pixel Loss
function. This forces the model to focus more on detailed structures of the objects in the image.
Input(64x64) | Ground Truth | Predicted |
---|---|---|
Input(64x64) | Ground Truth | Predicted |
---|---|---|
Input(64x64) | Ground Truth | Predicted |
---|---|---|
Input(64x64) | Ground Truth | Predicted |
---|---|---|
After taking into consideration the Perceptual Loss the model performs way better. Though there is one drawback. The images have a checkerboard like pattern in which is solely due to the perceptual loss. This model also gives a value of around 35-36db for a few images when PSNR(Peak Signal to Noise Ratio) is calculated.