Make a matrix output and ground truth example (segmentation, sliding window detection, etc.) #1698

shelhamer · 2015-01-08T16:21:35Z

Caffe is perfectly happy with models that make matrix outputs and learn from matrix ground truths for problems where the output and truth have spatial dimensions e.g. reconstruction / de-noising, pixelwise semantic segmentation, sliding window detection, and so forth. The forward and backward passes for these models follow directly from the definitions and Caffe has always been capable of computing these.

However, there isn't yet a bundled example and exactly how to accomplish this is confusing to many new users.
#189 is already solved technically by on-the-fly reshaping #594, instance-wise losses like SOFTMAX_LOSS, EUCLIDEAN_LOSS, SIGMOID_CROSS_ENTROPY_LOSS and so on, and proper data preparation. At the same time, this isn't immediately obvious from the documentation and examples so a walkthrough would do a lot of good.
#308 is technically redundant and does not mesh with the Caffe code but it was put to use in a standalone way and it's good that the code was made available to accompany the tech report.

ivendrov · 2015-01-15T17:19:54Z

I would find this very useful. Echoing #197, a reference OverFeat-like model in Caffe would be fantastic.

shelhamer · 2015-01-16T05:45:44Z

To help in the meantime, check out this code sample for generating an LMDB in Python with custom data:

import caffe
import lmdb
from PIL import Image

in_db = lmdb.open('image-lmdb', map_size=int(1e12))
with in_db.begin(write=True) as in_txn:
    for in_idx, in_ in enumerate(inputs):
        # load image:
        # - as np.uint8 {0, ..., 255}
        # - in BGR (switch from RGB)
        # - in Channel x Height x Width order (switch from H x W x C)
        im = np.array(Image.open(in_)) # or load whatever ndarray you need
        im = im[:,:,::-1]
        im = im.transpose((2,0,1))
        im_dat = caffe.io.array_to_datum(im)
        in_txn.put('{:0>10d}'.format(in_idx), im_dat.SerializeToString())
in_db.close()

While this code makes an image DB, you can likewise make the ground truth DB from any scalar / vector / matrix data and calling caffe.io.array_to_datum. Data with number of dimensions < 4 needs to be padded with singleton dimensions. Note that the indices are zero padded to preserve their order: LMDB sorts the keys lexicographically so bare integers as strings will be disordered.

bhack · 2015-03-06T18:20:10Z

/cc @mtamburrano

demondan · 2015-03-16T07:58:59Z

@shelhamer Sorry, I'm a newer for caffe and I can't fully understand. Could you tell me the steps dealing with matrix ground trouth

ctrevino · 2015-06-05T10:00:26Z

@shelhamer you sent this link through caffe-users group, but it is still a little bit confusing. As far as i understand 'inputs' should contain the location for all the images right?

There is a missing bracket in your code, it should be:

im = im.transpose((2,0,1))

zizhaozhang · 2015-07-06T17:38:41Z

Hi,
How is the speed of using this code in python compared with the tool provided by caffe-convert_imageset?

Linzert · 2015-08-01T05:14:08Z

@shelhamer Hi,
The caffe.io.array_to_datum can only take three dimentions datas as input.But the groundtruth is two dimentions.What should I do to solve this problem?

shelhamer · 2015-08-03T00:14:55Z

@Linzert make a singleton dimension for the channels so the groundtruth is 1 x H x W. Please ask usage questions on the caffe-users group.

Linzert · 2015-08-03T01:33:12Z

@shelhamer OK! I got it.Thanks very much.

inferrna · 2015-09-30T03:10:54Z

@ctrevino @demondan
Basing on this example https://github.com/BVLC/caffe/blob/master/examples/01-learning-lenet.ipynb you just need to change loading data step like this:

label = L.Data(batch_size=99, backend=P.Data.LMDB, source='train_label', transform_param=dict(scale=1./255), ntop=1)
data = L.Data(batch_size=99, backend=P.Data.LMDB, source='train_data', transform_param=dict(scale=1./255), ntop=1)

liyangliu · 2015-11-29T08:33:32Z

Hello, @shelhamer

sorry to bother you, I am trying to fine-tune your fcn_alexnet to my own dataset, but when i use gpu, i encounter a problem as follows:

I1128 22:55:19.044678 358 parallel.cpp:395] GPUs pairs 0:1, 2:3, 0:2
I1128 22:55:19.319453 358 data_layer.cpp:44] output data size: 20,3,451,451
I1128 22:55:19.419453 358 data_layer.cpp:44] output data size: 20,1,451,451
I1128 22:55:22.330047 358 data_layer.cpp:44] output data size: 20,3,451,451
I1128 22:55:22.433482 358 data_layer.cpp:44] output data size: 20,1,451,451
I1128 22:55:25.011587 358 parallel.cpp:238] GPU 2 does not have p2p access to GPU 0
I1128 22:55:25.315258 358 data_layer.cpp:44] output data size: 20,3,451,451
I1128 22:55:25.440186 358 data_layer.cpp:44] output data size: 20,1,451,451
I1128 22:55:28.140528 358 parallel.cpp:423] Starting Optimization
I1128 22:55:28.141278 358 solver.cpp:293] Solving FCN-AlexNet-FD
I1128 22:55:28.141311 358 solver.cpp:294] Learning Rate Policy: fixed
I1128 22:55:28.141938 358 solver.cpp:346] Iteration 0, Testing net (#0)
F1128 22:55:28.424314 358 math_functions.cu:123] Check failed: status == CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPPING_ERROR
*** Check failure stack trace: ***
@ 0x7fa222b9212d google::LogMessage::Fail()
@ 0x7fa222b93fcd google::LogMessage::SendToLog()
@ 0x7fa222b91d48 google::LogMessage::Flush()
@ 0x7fa222b9482e google::LogMessageFatal::~LogMessageFatal()
@ 0x7fa2232c211a caffe::caffe_gpu_asum<>()
@ 0x7fa2232c4172 caffe::SoftmaxWithLossLayer<>::Forward_gpu()
@ 0x7fa2231d6db2 caffe::Net<>::ForwardFromTo()
@ 0x7fa2231d6ed7 caffe::Net<>::ForwardPrefilled()
@ 0x7fa2231c1ba6 caffe::Solver<>::Test()
@ 0x7fa2231c23be caffe::Solver<>::TestAll()
@ 0x7fa2231c250d caffe::Solver<>::Step()
@ 0x7fa2231c2f15 caffe::Solver<>::Solve()
@ 0x7fa2231ce35e caffe::P2PSync<>::run()
@ 0x40906e train()
@ 0x40671b main
@ 0x7fa222096a40 (unknown)
@ 0x406eb9 _start

i have transform my data to lmdb by your python code at github and use it as the source.

print("Creating Training Data LMDB File ..... ")
in_db = lmdb.open('../data/FDDB/8_2/TrainFDDB_Data_lmdb',map_size=map_size)
with in_db.begin(write=True) as in_txn:
for in_idx, in_ in enumerate(inputs_data_train):
print in_idx
im = np.array(Image.open(in_)) # or load whatever ndarray you need
Dtype = im.dtype
if len(im.shape) == 2:
(row, col) = im.shape
im3 = np.zeros([row, col, 3], Dtype)
for i in range(3):
im3[:, :, i] = im
im = im3
im = im[:,:,::-1]
im = Image.fromarray(im)
im = im.resize([Rheight, Rwidth], Image.ANTIALIAS)
im = np.array(im,Dtype)
im = im.transpose((2,0,1))
im_dat = caffe.io.array_to_datum(im)
in_txn.put('{:0>10d}'.format(in_idx),im_dat.SerializeToString())
in_db.close()

print("Creating Training Label LMDB File ..... ")
in_db = lmdb.open('../data/FDDB/8_2/TrainFDDB_Label_lmdb',map_size=map_size)
with in_db.begin(write=True) as in_txn:
for in_idx, in_ in enumerate(inputs_label_train):
print in_idx
# in_label = in_[:-40]+'SegmentationClass/'+in_[-15:-3]+'png'
Dtype = 'uint8'
L = np.array(Image.open(in_), Dtype) # or load whatever ndarray you need
Limg = Image.fromarray(L)
Limg = Limg.resize([LabelHeight, LabelWidth],Image.NEAREST) # To resize the Label file to the required size
L = np.array(Limg,Dtype)
# L2 = np.zeros([LabelHeight, LabelWidth, 2], Dtype)
# L2[:, :, 0] = L
# L2[:, :, 1] = 1 - L
L = L.reshape(L.shape[0],L.shape[1],1)
L = L.transpose((2,0,1))
L_dat = caffe.io.array_to_datum(L)
in_txn.put('{:0>10d}'.format(in_idx),L_dat.SerializeToString())
in_db.close()

layer {
name: "data"
type: "Data"
top: "data"
include {
phase: TRAIN
}
transform_param {
mean_value: 104.00699
mean_value: 116.66877
mean_value: 122.67892
}
data_param {
source: "data/Train_Data_lmdb"
batch_size: 20
backend: LMDB
}
}
layer {
name: "label"
type: "Data"
top: "label"
include {
phase: TRAIN
}
data_param {
source: "data/Train_Label_lmdb"
batch_size: 20
backend: LMDB
}
}

the same with test phase

i created 4 lmdbs for train data, test data, train label, test label respectively
my label is a matrix(each element is 0 or 1(uint8) as yes or not), and saved as png pictures
and the last convolution layer and deconvolution layer of my net has num_output = 2

when I use cpu mode, it is ok, but it's too slow
could you please be so kind as to help me a little?
thanks very much.

i have looking for the answer for so long and can't figure it out.
i really don't want to bother you, but i have no idea how to leave a message for you at github, sorry.

thank you.

shelhamer · 2015-12-02T01:22:27Z

@liyangliu

CUBLAS_STATUS_SUCCESS (11 vs. 0) CUBLAS_STATUS_MAPPING_ERROR is usually an out-of-memory error in disguise. Please follow-up on caffe-users to know more.

https://github.com/BVLC/caffe/blob/master/CONTRIBUTING.md:

Please do not post usage, installation, or modeling questions, or other requests for help to Issues.
Use the caffe-users list instead. This helps developers maintain a clear, uncluttered, and efficient view of the state of Caffe.

varunagrawal · 2016-03-23T02:02:45Z

To help anyone in the future with this, here is an example notebook:
http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/pascal-multilabel-with-datalayer.ipynb

jpsquare · 2016-04-14T16:24:01Z

@varunagrawal the notebook example code is giving me an error in the lines(In [2]):

sys.path.append("pycaffe/layers") # the datalayers we will use are in this directory.
sys.path.append("pycaffe") # the tools file is in this folder
import tools #this contains some tools that we need

I cannot find the mentioned 'tools' file in the master branch.

shelhamer · 2016-04-14T19:02:56Z

Please see the fcn.berkeleyvision.org repo for master compatible FCNs, solver configurations, and scripts for learning, inference, and scoring. This includes how to define the net and python layers for loading inputs and labels which are both 3-D.

Closing, as I think the multilabel example #3471 and an FCN example as requested in #3890 will satisfy this issue. Please post on the caffe-users mailing list with further questions about modeling and usage for labels >1-D.

ChiZhangRIT · 2016-09-12T16:45:28Z

@ctrevino What does 'inputs' look like? How to contain all the images in "inputs"?

AlexTS1980 · 2017-08-17T10:20:34Z

For the label LMDB creation, just upload each label as caffe.io.load_image("img.png", False) , which will return an image (mask) size HxWx1. You'll just have to transpose it to 1xHxW after that. Also, this function loads image as RGB, not BGR

shelhamer added the documentation label Jan 8, 2015

This was referenced Jan 16, 2015

Is there any tutorial about pycaffe? #1708

Closed

Explain off-the-shelf SGD example and add regression #1777

Open

erictzeng mentioned this issue Feb 14, 2015

how to create lmdb with jpg image files #1846

Closed

shelhamer mentioned this issue Mar 6, 2015

Vectors as labels #2047

Closed

This was referenced Mar 12, 2015

DenseNet feature pyramid computation #308

Closed

Using Caffe for RGB-D dataset with depth map output #955

Closed

bhack mentioned this issue Mar 29, 2015

Multi label support #1380

Closed

tmbo mentioned this issue Jun 1, 2015

HDF5 output normanrz/face-vid#8

Closed

bhack mentioned this issue Jun 23, 2015

Multi label Data and MultiLabel Accuracy #523

Closed

lukeyeager mentioned this issue Jul 14, 2015

Add support for multiple and/or floating point labels NVIDIA/DIGITS#97

Closed

This was referenced Jul 23, 2015

failure when running make_imagenet_mean.sh #2062

Closed

Check failed: mdb_status == 0 (2 vs. 0) No such file or directory #2780

Closed

raingo mentioned this issue Aug 26, 2015

A MultiGPU bug with multiple input layers #2977

Closed

terrychenism mentioned this issue Nov 8, 2015

train for fcn by caffe-windows-cudnn terrychenism/caffe-windows-cudnn#15

Closed

futurely mentioned this issue Nov 19, 2015

How to use mxnet for image segmentation training? apache/mxnet#337

Closed

chriss2401 mentioned this issue Mar 10, 2016

How to do regression? #512

Closed

shelhamer closed this as completed Apr 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make a matrix output and ground truth example (segmentation, sliding window detection, etc.) #1698

Make a matrix output and ground truth example (segmentation, sliding window detection, etc.) #1698

shelhamer commented Jan 8, 2015

ivendrov commented Jan 15, 2015

shelhamer commented Jan 16, 2015

bhack commented Mar 6, 2015

demondan commented Mar 16, 2015

ctrevino commented Jun 5, 2015

zizhaozhang commented Jul 6, 2015

Linzert commented Aug 1, 2015

shelhamer commented Aug 3, 2015

Linzert commented Aug 3, 2015

inferrna commented Sep 30, 2015

liyangliu commented Nov 29, 2015

shelhamer commented Dec 2, 2015

varunagrawal commented Mar 23, 2016

jpsquare commented Apr 14, 2016

shelhamer commented Apr 14, 2016

ChiZhangRIT commented Sep 12, 2016

AlexTS1980 commented Aug 17, 2017

Make a matrix output and ground truth example (segmentation, sliding window detection, etc.) #1698

Make a matrix output and ground truth example (segmentation, sliding window detection, etc.) #1698

Comments

shelhamer commented Jan 8, 2015

ivendrov commented Jan 15, 2015

shelhamer commented Jan 16, 2015

bhack commented Mar 6, 2015

demondan commented Mar 16, 2015

ctrevino commented Jun 5, 2015

zizhaozhang commented Jul 6, 2015

Linzert commented Aug 1, 2015

shelhamer commented Aug 3, 2015

Linzert commented Aug 3, 2015

inferrna commented Sep 30, 2015

liyangliu commented Nov 29, 2015

shelhamer commented Dec 2, 2015

varunagrawal commented Mar 23, 2016

jpsquare commented Apr 14, 2016

shelhamer commented Apr 14, 2016

ChiZhangRIT commented Sep 12, 2016

AlexTS1980 commented Aug 17, 2017