This repository contains code of the paper Learning to Simplify: Fully Convolutional Networks for Rough Sketch Cleanup which is tested and trained on cutom datasets. It is based on PyTorch.
The paper presents novel technique to simplify sketch drawings based on learning a series of convolution operators. Image of any dimension can be fed into the network, and it outputs the image of same dimension as the input image.
The architecture consists of encoder and a decoder, the first part acts as an encoder and spatially compresses the image, the second part, processes and extracts the essential lines from the image, and the third and last part acts as a decoder which converts the small more simple representation to an grayscale image of the same resolution as the input. This is all done using convolutions. The down- and up-convolution architecture may seem similar to a simple filter banks. However, it is important to realize that the number of channels is much larger where resolution is lower, e.g., 1024 where the size is 1/8. This ensures that information that leads to clean lines is carried through the low-resolution part; the network is trained to choose which information to carry by the encoder- decoder architecture. Padding is used to compensate for the kernel size and ensure the output is the same size as the input when a stride of 1 is used. Pooling layers are replaced by convolutional layers with increased strides to lower the resolution from the previous layer.
Clone the repositiory on your local machine.
git clone https://github.com/ishanrai05/rough-sketch-simplification-using-FCNN
Start a virtual environment using python3
virtualenv env
Install the dependencies
pip install -r requirements.txt
You can also use google collab notebook. In that case just upload the notebook provided in the repository and you are good to go.
The authors have not provided dataset for the paper. So I created my own. I have uploaded the dataset on drive, the link to which can be found here. Feel free to use it.
To train the model, run
python main.py --train=True
optional arguments:
argument | default | desciption |
---|---|---|
-h, --help | show help message and exit | |
--use_cuda | False | device to train on. default is CPU |
--samples | False | See sample images |
--num_epochs | 10 | Number of epochs to train on |
--train | True | train the model |
--root | '.' | Root Directory for Input and Target images |
The modal takes about 63 mins to train for 150 epochs on Google Collab with Nvidia Tesla K80 GPU.
Epoch | Prediction |
---|---|
2 | |
60 | |
100 | |
140 |
This repository contains the following files and folders
-
notebook: This folder contains the jupyter notebook for code.
-
images: Contains images.
-
pred: Contains prediction images.
-
constants.py
: image width and size while training. -
CustomDataset.py
: code for dataset generation. -
model.py
: code for model as described in the paper. -
predict.py
: function to simplify image using model. -
read_data.py
: code to read images. -
visualize.py
: code for visualizations. -
utils.py
: Contains helper functions. -
train.py
: function to train models from scratch. -
main.py
: contains main code to run the model. -
requirements.txt
: Lists dependencies for easy setup in virtual environments.