Tensorflow implementation of Fully Convolutional Networks for Semantic Segmentation (FCN-8 in particular) based on the code written by shekkizh and modified to be used with ease for any given task.
The model can be applied on the Scene Parsing Challenge dataset provided by MIT straightaway after cloning this repo. For a pretrained model checkout this link, it is not very precise but that's the only one I could get due to my lack of computing resources.
- numpy
- scipy
- opencv (both 2.4.x and 3.x should work)
- Tested only with
tensorflow 1.1.0
andpython 2.7.12
onUbuntu 16.04
. I tried to make thispython 3
compatible but I haven't checked yet if it works.
As pointed out frequently in the issue tracker of FCN.tensorflow there were some discrepancies between the caffe and the tensorflow implementation. Here are the main ones and how I handled them:
-
Conv6 padding 1
In the original implementation there was no padding in order to shrink the tensor down to [batch_size, 1, 1, 4096]
.
This works well when the input images are 224x224 but for any other resolution it will break the deconvolution phase.
Since the padding did no harm and the results were still acceptable I decided to leave it as it is.
-
Average Pooling or Max Pooling 2
I just sticked to the original implementation using max pooling.
-
Final layer of VGG 3
I just sticked to the original implementation using the relu'd layer.
The original implementation used VGG16, shekkizh used VGG19. I leave you with the freedom of choice.
example.py should be self-explanatory for basic usage. Note that a trained model can be run also on arbitrary sized images that will be accordingly padded to avoid information loss during pooling.
Things you can set while setting up the network and training phase:
- Number of classes
- Validation Set (if you want to keep track of its loss)
- Learning Rate
- Keep Probability (1 - Dropout) for some layers
- Training loss summary frequency
- Validation loss summary frequency
- Model saving frequency
- Maximum number of steps
- Choosing between VGG19 or VGG16 (and maybe others if implemented) as encoders.
Things you can tweak in the code:
- Optimizer - default is Adam.
- Loss function
- Number of models to keep saved during training.
get_ade_dataset.sh is a simple script to download, verify and extract the whole dataset. It also changes the file/directory structure in order to be more coherent and compatible with the ADE_Dataset
class.
In dataset_reader there are two classes, BatchDataset which is supposed to be an abstract class, and ADE_Dataset which is an example on how to specialize BatchDataset and is ready to be used for training. Basic usage of a subclass:
dt = MyDataset(*args, **kwargs)
images, annotations, weights, names = dt.next_batch()
where images
, annotations
and weights
are numpy arrays of shape [batch_size, height, width, channels]
(3 channels for images and 1 for both annotations and weights). In ADE_Dataset weights
are not used but for other tasks with different datasets they might be useful.
When subclassing BatchDataset there are things to keep in mind:
- the argument
names
is required to identify its elements, but it might not be mandatory to be passed by the user to the subclass unless you want to be able to specify a subset of the dataset (as I did with ADE_Dataset when creating the validation set) - the argument
image_op
is often required to perform cropping and resizing when handling batch sizes greater than 1 unless you have a homogeneous dataset image_size
has to be specified only ifbatch_size
is greater than 1