Closed
Description
Dense Prediction API Design, Including Segmentation and Fully Convolutional Networks
This issue is to develop an API design for dense prediction tasks such as Segmentation, which includes Fully Convolutional Networks (FCN), and was based on the discussion at #5228 (comment). The goal is to ensure Keras incorporates best practices by default for this sort of problem. Community input, volunteers, and implementations will be very welcome. #6655 is where preprocessing layers can be discussed.
Motivating Tasks and Datasets
- Pascal VOC 2012 Single Label Segmentation
- MSCOCO Multi Label Segmentation
- Unambiguously refer to a particular person or object in image with refcoco
- Reinforcement Learning with OpenAI Gym
- mscoco.org/external has additional examples
Reference Materials
- Fully Convolutional Networks for Semantic Segmentation
- U-Net
- Multi-Scale Context Aggregation by Dilated Convolutions
- ResNet and Resnetv2
- Wider or Deeper: Revisiting the ResNet Model for Visual Recognition
- Fully Convolutional DenseNets
- Daniil's Blog (highly detailed, but for tensorflow)
Feature Requests
These are ideas rather than a finalized proposal so input is welcome!
- Input data: Support one or more Images as input + Supplemental data (ex: image + vector)
- Augmentation of Input Data and Dense Labels
- Example: Both image and label must be zoomed & translated equally in Pascal VOC
- Input image dimensions should be able to vary
- Ideally by height, width & number of channels
- Loss function "2D" support, such as single and multi label results for each pixel in an image
- class_weight support for dense labels
- Example: Single class weight value for each class in an image segmentation task such as in Pascal VOC 2012.
- Sparse to Dense Prediction weight transfer
- Conversion of ImageNet weights from pre-trained models for segmentation tasks
- Keras-FCN example
- Locking of batch normalization layers, often used during transfer process
- Automatic Sparse to Dense Model conversion (advanced)
- configuration at each downsampling stage
- remove pooling layers and apply an equivalent atrous dilation in the next convolution layer
- add an upsampling layer for each downsampling stage
- SegmentationTop Layer?
- Sigmoid single class predictions
- Spatial Softmax argmax multi class predictions
- Multi Label Predictions (sigmoid?)
- "Upsample" Layer?
- like "Activation" layer, where reasonable upsampling approaches can be defined with a simple string parameter
- Example implementation training & testing on MSCOCO & Pascal VOC 2012 + extended berkeley labels
- (advanced) pretrain pascal voc on coco then VOC
- COCO pycocotools json format dataset support used by several datasets
- supports multi-label segmentation, keypoint data, image descriptions, and more
- TFRecord dataset support (probably TensorFlow only, maybe only in tensorflow implementation of keras)
- flow_from_directory & Segmentation Data Generator
- Keras-FCN,
- Single class label support
- Multi class label support
- mean Intesection Over Union (mIOU) utility Keras-FCN
- Image and label masks
- Proper palette handling for png based labels
- sparse label format for multi-label data?
- debugging utilities
- save predictions to file
- Iterative training of partial networks at varying strides, as described in the FCN paper (advanced, may not be necessary as per Keras-FCN performance)
Existing Keras Utilities with compatible license
- keras-contrib has:
- DensenetFCN implementation
- MSCOCO
- Pascal VOC 2012 + extended berkeley labels
- a couple of upsampling approaches
- additional datasets - coco + voc2012 keras-contrib#47 incorporating coco + voc 2012
- Keras-FCN
- I've been working on this one, current basis for design suggestions
- segmentation_keras
- includes example using caffe weight conversion utilties
- fairly clean
- enet-keras
- includes work towards mscoco support
- https://github.com/azavea/raster-vision/
- is apache v2 compatible? I think so if keras is in tf now
- https://github.com/JihongJu/keras-fcn
Questions
- Is something as clear as 30 seconds to keras segmentation possible?
- Is anything above missing, redundant, or out of date compared to the state of the art?
- Should the current ImageDataGenerator be extended or is a separate class like Keras-FCN's SegDataGenerator clearer?
- Should there be a guide of some sort?
- What will make for useful training progress and debugging data? (sparse mIOU?, something else?)
- What is needed to handle large datasets quickly and efficiently? (should this be out of scope?)