This project was built as a part of the RESCON event hosted by SRM Machine Intelligence Community, in which we were asked to implement a research paper. Our team Phoneix choose PaletteNet: Image Recolorization with Given Color Palette", by Junho Cho, Sangdoo Yun, Kyoungmu Lee and Jin Young Choi
.
You can download the paper from here
https://colab.research.google.com/drive/1DqwgiGBmf14kGhsdCTcOl_FzvYilagoD#scrollTo=7JflO10d9TWZ
* Go to the link and click connect on the top right.
* Press the run button next to the first code block to run it. It will download the model files and weights.
* You can run all the blocks using (CTRL + ENTER).
* Run all the blocks in a sequence. The last one will show you the re-colored image.
PaletteNet is a deep neural network, which recolors an image according to a given target color palette. This takes two inputs a source image to be re-colored and a target palette. Human experts with a commercial software take on average 18 minutes to recolor an image, while PaletteNet automatically recolors plausible results in less than a second.
We created our own dataset since this task requires both source image and corresponding palette. We scrapped 1043
high-quality images from "https://www.design-seeds.com". This created a raw dataset of 1.1GB
which was stored on AWS S3.
- Cropping-As the source image and palette are attached we first need to separate them and then resize the image to
384,286
. - Hue-Shift-To train the model we created
20
variants of the image using the algorithm shown below.
RGB -> LAB and cache L
RGB -> HSV --hue shift--> H*SV -> LAB
Final hue-shifted image: LA∗B∗
After the pre-processing the size of the dataset increased to 20540
images and it occupied 3.1
GB of space.
- Our model is divided into three parts: Feature Encoder, Recoloring Decoder and Discrimantor.
- Feature Encoder is made up of ResNet blocks which extracts the content feature to a tensor of size [512x25x16].
- Recoloring Decoder takes the target palette and content features as the input and outputs the final recolored image.
There are two phases of training:
- In the first phase, we train FE+RD using Euclidean loss
- In the second phase, we use a discriminator to differentiate between original images( no hue-shift ) and generated images.
- We are using the Adam optimizer with β1=
0.5
, lr=0.0002
and batch-size=12
.
We visually compare the generated and expected 'a' and 'b' components.
- We were not provided with any sort of dataset.
- The paper did not describe the data preparation pipeline, only the algorithm.
- The training process was quite daunting because of two separate training loops and tools we had to make for visualization.