Skip to content

Commit

Permalink
Merge pull request #482 from shelhamer/rcnn-detector-example
Browse files Browse the repository at this point in the history
Make R-CNN the Caffe detection example
  • Loading branch information
shelhamer committed Jun 11, 2014
2 parents 63c7429 + 9882d47 commit a7e397a
Show file tree
Hide file tree
Showing 7 changed files with 801 additions and 294 deletions.
2 changes: 2 additions & 0 deletions docs/getting_pretrained_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,6 @@ This page will be updated as more models become available.
- The best validation performance during training was iteration 358,000 with
validation accuracy 57.258% and loss 1.83948.

**R-CNN (ILSVRC13)**: The pure Caffe instantiation of the [R-CNN](https://github.com/rbgirshick/rcnn) model for ILSVRC13 detection. Download the model (230.8MB) by running `examples/imagenet/get_caffe_rcnn_imagenet_model.sh` from the Caffe root directory. This model was made by transplanting the R-CNN SVM classifiers into a `fc-rcnn` classification layer, provided here as an off-the-shelf Caffe detector. Try the [detection example](http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/detection.ipynb) to see it in action. For the full details, refer to the R-CNN site. *N.B. For research purposes, make use of the official R-CNN package and not this example.*

Additionally, you will probably eventually need some auxiliary data (mean image, synset list, etc.): run `data/ilsvrc12/get_ilsvrc_aux.sh` from the root directory to obtain it.
750 changes: 467 additions & 283 deletions examples/detection.ipynb

Large diffs are not rendered by default.

27 changes: 27 additions & 0 deletions examples/imagenet/get_caffe_rcnn_imagenet_model.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/usr/bin/env sh
# This scripts downloads the Caffe R-CNN ImageNet
# for ILSVRC13 detection.

MODEL=caffe_rcnn_imagenet_model
CHECKSUM=42c1556d2d47a9128c4a90e0a9c5341c

if [ -f $MODEL ]; then
echo "Model already exists. Checking md5..."
os=`uname -s`
if [ "$os" = "Linux" ]; then
checksum=`md5sum $MODEL | awk '{ print $1 }'`
elif [ "$os" = "Darwin" ]; then
checksum=`cat $MODEL | md5`
fi
if [ "$checksum" = "$CHECKSUM" ]; then
echo "Model checksum is correct. No need to download."
exit 0
else
echo "Model checksum is incorrect. Need to download again."
fi
fi

echo "Downloading..."

wget --no-check-certificate https://www.dropbox.com/s/0i3etlgmsmgf5ei/$MODEL
echo "Done. Please run this command again to verify that checksum = $CHECKSUM."
207 changes: 207 additions & 0 deletions examples/imagenet/rcnn_imagenet_deploy.prototxt
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
name: "R-CNN-ilsvrc13"
input: "data"
input_dim: 10
input_dim: 3
input_dim: 227
input_dim: 227
layers {
name: "conv1"
type: CONVOLUTION
bottom: "data"
top: "conv1"
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
}
}
layers {
name: "relu1"
type: RELU
bottom: "conv1"
top: "conv1"
}
layers {
name: "pool1"
type: POOLING
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layers {
name: "norm1"
type: LRN
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layers {
name: "conv2"
type: CONVOLUTION
bottom: "norm1"
top: "conv2"
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
}
}
layers {
name: "relu2"
type: RELU
bottom: "conv2"
top: "conv2"
}
layers {
name: "pool2"
type: POOLING
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layers {
name: "norm2"
type: LRN
bottom: "pool2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layers {
name: "conv3"
type: CONVOLUTION
bottom: "norm2"
top: "conv3"
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
}
}
layers {
name: "relu3"
type: RELU
bottom: "conv3"
top: "conv3"
}
layers {
name: "conv4"
type: CONVOLUTION
bottom: "conv3"
top: "conv4"
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
}
}
layers {
name: "relu4"
type: RELU
bottom: "conv4"
top: "conv4"
}
layers {
name: "conv5"
type: CONVOLUTION
bottom: "conv4"
top: "conv5"
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
}
}
layers {
name: "relu5"
type: RELU
bottom: "conv5"
top: "conv5"
}
layers {
name: "pool5"
type: POOLING
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layers {
name: "fc6"
type: INNER_PRODUCT
bottom: "pool5"
top: "fc6"
inner_product_param {
num_output: 4096
}
}
layers {
name: "relu6"
type: RELU
bottom: "fc6"
top: "fc6"
}
layers {
name: "drop6"
type: DROPOUT
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layers {
name: "fc7"
type: INNER_PRODUCT
bottom: "fc6"
top: "fc7"
inner_product_param {
num_output: 4096
}
}
layers {
name: "relu7"
type: RELU
bottom: "fc7"
top: "fc7"
}
layers {
name: "drop7"
type: DROPOUT
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
# R-CNN classification layer made from R-CNN ILSVRC13 SVMs.
layers {
name: "fc-rcnn"
type: INNER_PRODUCT
bottom: "fc7"
top: "fc-rcnn"
inner_product_param {
num_output: 200
}
}
Binary file added examples/images/fish-bike.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
96 changes: 88 additions & 8 deletions python/caffe/detector.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,6 @@
The selective_search_ijcv_with_python code required for the selective search
proposal mode is available at
https://github.com/sergeyk/selective_search_ijcv_with_python
TODO
- R-CNN crop mode / crop with context.
- Bundle with R-CNN model for example.
"""
import numpy as np
import os
Expand All @@ -29,11 +25,14 @@ class Detector(caffe.Net):
selective search proposals.
"""
def __init__(self, model_file, pretrained_file, gpu=False, mean_file=None,
input_scale=None, channel_swap=None):
input_scale=None, channel_swap=None, context_pad=None):
"""
Take
gpu, mean_file, input_scale, channel_swap: convenience params for
setting mode, mean, input scale, and channel order.
context_pad: amount of surrounding context to take s.t. a `context_pad`
sized border of pixels in the network input image is context, as in
R-CNN feature extraction.
"""
caffe.Net.__init__(self, model_file, pretrained_file)
self.set_phase_test()
Expand All @@ -50,6 +49,8 @@ def __init__(self, model_file, pretrained_file, gpu=False, mean_file=None,
if channel_swap:
self.set_channel_swap(self.inputs[0], channel_swap)

self.configure_crop(context_pad)


def detect_windows(self, images_windows):
"""
Expand All @@ -58,6 +59,7 @@ def detect_windows(self, images_windows):
Take
images_windows: (image filename, window list) iterable.
context_crop: size of context border to crop in pixels.
Give
detections: list of {filename: image filename, window: crop coordinates,
Expand All @@ -68,8 +70,7 @@ def detect_windows(self, images_windows):
for image_fname, windows in images_windows:
image = caffe.io.load_image(image_fname).astype(np.float32)
for window in windows:
window_inputs.append(image[window[0]:window[2],
window[1]:window[3]])
window_inputs.append(self.crop(image, window))

# Run through the net (warping windows to input dimensions).
caffe_in = np.asarray([self.preprocess(self.inputs[0], window_in)
Expand Down Expand Up @@ -106,6 +107,85 @@ def detect_selective_search(self, image_fnames):
import selective_search_ijcv_with_python as selective_search
# Make absolute paths so MATLAB can find the files.
image_fnames = [os.path.abspath(f) for f in image_fnames]
windows_list = selective_search.get_windows(image_fnames)
windows_list = selective_search.get_windows(
image_fnames,
cmd='selective_search_rcnn'
)
# Run windowed detection on the selective search list.
return self.detect_windows(zip(image_fnames, windows_list))


def crop(self, im, window):
"""
Crop a window from the image for detection. Include surrounding context
according to the `context_pad` configuration.
Take
im: H x W x K image ndarray to crop.
window: bounding box coordinates as ymin, xmin, ymax, xmax.
Give
crop: cropped window.
"""
# Crop window from the image.
crop = im[window[0]:window[2], window[1]:window[3]]

if self.context_pad:
box = window.copy()
crop_size = self.blobs[self.inputs[0]].width # assumes square
scale = crop_size / (1. * crop_size - self.context_pad * 2)
# Crop a box + surrounding context.
half_h = (box[2] - box[0] + 1) / 2.
half_w = (box[3] - box[1] + 1) / 2.
center = (box[0] + half_h, box[1] + half_w)
scaled_dims = scale * np.array((-half_h, -half_w, half_h, half_w))
box = np.round(np.tile(center, 2) + scaled_dims)
full_h = box[2] - box[0] + 1
full_w = box[3] - box[1] + 1
scale_h = crop_size / full_h
scale_w = crop_size / full_w
pad_y = round(max(0, -box[0]) * scale_h) # amount out-of-bounds
pad_x = round(max(0, -box[1]) * scale_w)

# Clip box to image dimensions.
im_h, im_w = im.shape[:2]
box = np.clip(box, 0., [im_h, im_w, im_h, im_w])
clip_h = box[2] - box[0] + 1
clip_w = box[3] - box[1] + 1
assert(clip_h > 0 and clip_w > 0)
crop_h = round(clip_h * scale_h)
crop_w = round(clip_w * scale_w)
if pad_y + crop_h > crop_size:
crop_h = crop_size - pad_y
if pad_x + crop_w > crop_size:
crop_w = crop_size - pad_x

# collect with context padding and place in input
# with mean padding
context_crop = im[box[0]:box[2], box[1]:box[3]]
context_crop = caffe.io.resize_image(context_crop, (crop_h, crop_w))
crop = self.crop_mean.copy()
crop[pad_y:(pad_y + crop_h), pad_x:(pad_x + crop_w)] = context_crop

return crop


def configure_crop(self, context_pad):
"""
Configure amount of context for cropping.
If context is included, make the special input mean for context padding.
Take
context_pad: amount of context for cropping.
"""
self.context_pad = context_pad
if self.context_pad:
input_scale = self.input_scale.get(self.inputs[0])
channel_order = self.channel_swap.get(self.inputs[0])
# Padding context crops needs the mean in unprocessed input space.
self.crop_mean = self.mean[self.inputs[0]].copy()
self.crop_mean = self.crop_mean.transpose((1,2,0))
channel_order_inverse = [channel_order.index(i)
for i in range(self.crop_mean.shape[2])]
self.crop_mean = self.crop_mean[:,:, channel_order_inverse]
self.crop_mean /= input_scale
Loading

0 comments on commit a7e397a

Please sign in to comment.