Skip to content

Commit

Permalink
bring imagenet docs back to reality
Browse files Browse the repository at this point in the history
- fix paths
- replace shell command blocks with scripts
- file ipython notebooks in examples
- proofread
  • Loading branch information
shelhamer committed Feb 26, 2014
1 parent cd6df9e commit 9b2aca4
Show file tree
Hide file tree
Showing 6 changed files with 43 additions and 1,037 deletions.
28 changes: 9 additions & 19 deletions docs/imagenet_pretrained.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,33 +6,23 @@ title: Caffe
Running Pretrained ImageNet
===========================

[View this page as an IPython Notebook](http://nbviewer.ipython.org/url/caffe.berkeleyvision.org/imagenet_pretrained_files/imagenet_pretrained.ipynb)

For easier use of pretrained models, we provide a wrapper specifically written
for the case of ImageNet, so one can take an image and directly compute features
or predictions from them. Both Python and Matlab wrappers are provided. We will
describe the use of the Python wrapper here, and the Matlab wrapper usage is
very similar.

We assume that you have successfully compiled Caffe and set the correct
`PYTHONPATH`. If not, please refer to the [installation
instructions](installation.html). You will use our pre-trained imagenet model,
which you can
[download here](https://www.dropbox.com/s/n3jups0gr7uj0dv/caffe_reference_imagenet_model)
(232.57MB). Note that this pre-trained model is licensed for academic research /
non-commercial use only.
[View this page as an IPython Notebook](http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/imagenet_pretrained.ipynb)

For easier use of pretrained models, we provide a wrapper specifically written for the case of ImageNet, so one can take an image and directly compute features or predictions from them. Both Python and Matlab wrappers are provided. We will describe the use of the Python wrapper here, and the Matlab wrapper usage is very similar.

We assume that you have successfully compiled Caffe and set the correct `PYTHONPATH`. If not, please refer to the [installation instructions](installation.html). You will use our pre-trained imagenet model, which you can download (232.57MB) by running `models/get_caffe_reference_imagenet_model.sh`.Note that this pre-trained model is licensed for academic research / non-commercial use only.

Ready? Let's start.


from caffe import imagenet
from matplotlib import pyplot

# Set the right path to your model file, pretrained model,
# and the image you would like to classify.
MODEL_FILE = 'examples/imagenet_deploy.prototxt'
PRETRAINED = '/home/jiayq/Downloads/caffe_reference_imagenet_model'
IMAGE_FILE = '/home/jiayq/lena.png'
MODEL_FILE = 'models/imagenet.prototxt'
PRETRAINED = 'models/caffe_reference_imagenet_model'
IMAGE_FILE = '/path/to/lena.png'

Loading a network is easy. imagenet.ImagenetClassifier wraps everything. In
default, the classifier will crop the center and corners of an image, as well as
Expand Down
270 changes: 0 additions & 270 deletions docs/imagenet_pretrained_files/imagenet_pretrained.ipynb

This file was deleted.

57 changes: 29 additions & 28 deletions docs/imagenet.md → docs/imagenet_training.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ Yangqing's Recipe on Brewing ImageNet
=====================================

"All your braincells are belong to us."
- Starbucks
- Caffeine

We are going to describe a reference implementation for the approach first proposed by Krizhevsky, Sutskever, and Hinton in their [NIPS 2012 paper](http://books.nips.cc/papers/files/nips25/NIPS2012_0534.pdf). Since training the whole model takes quite some time and energy, we also provide a model, trained in the same way as we describe here, to help fight global warming. If you would like to simply use the pretrained model, check out the [Pretrained ImageNet](imagenet_pretrained.html) page.
We are going to describe a reference implementation for the approach first proposed by Krizhevsky, Sutskever, and Hinton in their [NIPS 2012 paper](http://books.nips.cc/papers/files/nips25/NIPS2012_0534.pdf). Since training the whole model takes some time and energy, we provide a model, trained in the same way as we describe here, to help fight global warming. If you would like to simply use the pretrained model, check out the [Pretrained ImageNet](imagenet_pretrained.html) page. *Note that the pretrained model is for academic research / non-commercial use only*.

To clarify, by ImageNet we actually mean the ILSVRC challenge, but you can easily train on the whole imagenet as well, just more disk space, and a little longer training time.
To clarify, by ImageNet we actually mean the ILSVRC12 challenge, but you can easily train on the whole of ImageNet as well, just with more disk space, and a little longer training time.

(If you don't get the quote, visit [Yann LeCun's fun page](http://yann.lecun.com/ex/fun/).

Expand All @@ -23,42 +23,43 @@ We assume that you already have downloaded the ImageNet training data and valida
/path/to/imagenet/train/n01440764/n01440764_10026.JPEG
/path/to/imagenet/val/ILSVRC2012_val_00000001.JPEG

You will first need to create a text file listing all the files as well as their labels. An example could be found in the caffe repo at `python/caffe/imagenet/ilsvrc_2012_train.txt` and `ilsvrc_2012_val.txt`. Note that in those two files we used a different indexing from the ILSVRC devkit: we sorted the synset names in their ASCII order, and then labeled them from 0 to 999.
You will first need to prepare some auxiliary data for training. This data can be downloaded by:

cd $CAFFE_ROOT/data/ilsvrc12/
./get_ilsvrc12_aux.sh

The training and validation input are described in `train.txt` and `val.txt` as text listing all the files and their labels. Note that we use a different indexing for labels than the ILSVRC devkit: we sort the synset names in their ASCII order, and then label them from 0 to 999. See `synset_words.txt` for the synset/name mapping.

You will also need to resize the images to 256x256: we do not explicitly do this because in a cluster environment, one may benefit from resizing images in a parallel fashion, using mapreduce. For example, Yangqing used his lightedweighted [mincepie](https://github.com/Yangqing/mincepie) package to do mapreduce on the Berkeley cluster. If you would things to be rather simple and straightforward, you can also use shell commands, something like:

for name in /path/to/imagenet/val/*.JPEG; do
convert -resize 256x256\! $name $name
done

Now, you can simply create a leveldb using commands as follows:
Go to `$CAFFE_ROOT/examples/imagenet/` for the rest of this guide.

GLOG_logtostderr=1 examples/convert_imageset.bin \
/path/to/imagenet/train/ \
python/caffe/imagenet/ilsvrc_2012_train.txt \
/path/to/imagenet-train-leveldb 1

Note that `/path/to/imagenet-train-leveldb` should not exist before this execution. It will be created by the script. `GLOG_logtostderr=1` simply dumps more information for you to inspect, and you can safely ignore it.
Take a look at `create_imagenet.sh`. Set the paths to the train and val dirs as needed. Now simply create the leveldbs with `./create_imagenet.sh`. Note that `imagenet_train_leveldb` and `imagenet_val_leveldb` should not exist before this execution. It will be created by the script. `GLOG_logtostderr=1` simply dumps more information for you to inspect, and you can safely ignore it.

Compute Image Mean
------------------

The Model requires us to subtract the image mean from each image, so we have to compute the mean. `examples/demo_compute_image_mean.cpp` implements that - it is also a good example to familiarize yourself on how to manipulate the multiple components, such as protocol buffers, leveldbs, and logging, if you are not familiar with it. Anyway, the mean computation can be carried out as:
The model requires us to subtract the image mean from each image, so we have to compute the mean. `tools/compute_image_mean.cpp` implements that - it is also a good example to familiarize yourself on how to manipulate the multiple components, such as protocol buffers, leveldbs, and logging, if you are not familiar with them. Anyway, the mean computation can be carried out as:

examples/demo_compute_image_mean.bin /path/to/imagenet-train-leveldb /path/to/mean.binaryproto
./make_imagenet_mean.sh

where `/path/to/mean.binaryproto` will be created by the program.
which will make `data/ilsvrc12/imagenet_mean.binaryproto`.

Network Definition
------------------
The network definition follows strictly the one in Krizhevsky et al. You can find the detailed definition at `examples/imagenet.prototxt`. Note that to run it, you will most likely need to change the paths in the data layer - change the following lines

source: "/home/jiayq/Data/ILSVRC12/train-leveldb"
meanfile: "/home/jiayq/Data/ILSVRC12/image_mean.binaryproto"
The network definition follows strictly the one in Krizhevsky et al. You can find the detailed definition at `examples/imagenet/imagenet.prototxt`. Note that the paths in the data layer - if you have not followed the exact paths in this guide you will need to change the following lines:

source: "ilvsrc12_train_leveldb"
meanfile: "../../data/ilsvrc12/imagenet_mean.binaryproto"

to point to your own leveldb and image mean. Likewise, do the same for `examples/imagenet_val.prototxt`.
to point to your own leveldb and image mean. Likewise, do the same for `examples/imagenet/imagenet_val.prototxt`.

If you look carefully at `imagenet.prototxt` and `imagenet_val.prototxt`, you will notice that they are largely the same, with the only difference being the data layer sources, and the last layer: in training, we will be using a `softmax_loss` layer to compute the loss function and to initialize the backpropagation, while in validation we will be using an `accuracy` layer to inspect how well we do in terms of accuracy.
If you look carefully at `imagenet_train.prototxt` and `imagenet_val.prototxt`, you will notice that they are largely the same, with the only difference being the data layer sources, and the last layer: in training, we will be using a `softmax_loss` layer to compute the loss function and to initialize the backpropagation, while in validation we will be using an `accuracy` layer to inspect how well we do in terms of accuracy.

We will also lay out a protocol buffer for running the solver. Let's make a few plans:
* We will run in batches of 256, and run a total of 4,500,000 iterations (about 90 epochs).
Expand All @@ -68,19 +69,19 @@ We will also lay out a protocol buffer for running the solver. Let's make a few
* The network will be trained with momentum 0.9 and a weight decay of 0.0005.
* For every 10,000 iterations, we will take a snapshot of the current status.

Sounds good? This is implemented in `examples/imagenet_solver.prototxt`. Again, you will need to change the first two lines:
Sound good? This is implemented in `examples/imagenet/imagenet_solver.prototxt`. Again, you will need to change the first two lines:

train_net: "examples/imagenet.prototxt"
test_net: "examples/imagenet_val.prototxt"
train_net: "imagenet_train.prototxt"
test_net: "imagenet_val.prototxt"

to point to the actual path.
to point to the actual path if you have changed them.

Training ImageNet
-----------------

Ready? Let's train.

GLOG_logtostderr=1 examples/train_net.bin examples/imagenet_solver.prototxt
./train_imagenet.sh

Sit back and enjoy! On my K20 machine, every 20 iterations take about 36 seconds to run, so effectively about 7 ms per image for the full forward-backward pass. About 2.5 ms of this is on forward, and the rest is backward. If you are interested in dissecting the computation time, you can look at `examples/net_speed_benchmark.cpp`, but it was written purely for debugging purpose, so you may need to figure a few things out yourself.

Expand All @@ -89,13 +90,13 @@ Resume Training?

We all experience times when the power goes out, or we feel like rewarding ourself a little by playing Battlefield (does someone still remember Quake?). Since we are snapshotting intermediate results during training, we will be able to resume from snapshots. This can be done as easy as:

GLOG_logtostderr=1 examples/train_net.bin examples/imagenet_solver.prototxt caffe_imagenet_train_10000.solverstate
./resume_training.sh

where `caffe_imagenet_train_1000.solverstate` is the solver state snapshot that stores all necessary information to recover the exact solver state (including the parameters, momentum history, etc).
where in the script `caffe_imagenet_train_1000.solverstate` is the solver state snapshot that stores all necessary information to recover the exact solver state (including the parameters, momentum history, etc).

Parting Words
-------------

Hope you liked this recipe. Many researchers have gone further since the ILSVRC 2012 challenge, changing the network architecture and/or finetuning the various parameters in the network. The recent ILSVRC 2013 challenge suggests that there are quite some room for improvement. **Caffe allows one to explore different network choices more easily, by simply writing different prototxt files** - isn't that exciting?
Hope you liked this recipe! Many researchers have gone further since the ILSVRC 2012 challenge, changing the network architecture and/or finetuning the various parameters in the network. The recent ILSVRC 2013 challenge suggests that there are quite some room for improvement. **Caffe allows one to explore different network choices more easily, by simply writing different prototxt files** - isn't that exciting?

And since now you have a trained network, check out how to use it: [Running Pretrained ImageNet](imagenet_pretrained.html). This time we will use Python, but if you have wrappers for other languages, please kindly send me a pull request!
And since now you have a trained network, check out how to use it: [Running Pretrained ImageNet](imagenet_pretrained.html). This time we will use Python, but if you have wrappers for other languages, please kindly send a pull request!
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Quick Links
* [Presentation](https://docs.google.com/presentation/d/1lzyXMRQFlOYE2Jy0lCNaqltpcCIKuRzKJxQ7vCuPRc8/edit?usp=sharing): Presentation on Caffe at the UC Berkeley Vision Group meeting.
* [Installation](installation.html): Instructions on installing Caffe (tested on Ubuntu 12.04, but works on Red Hat, OS X, etc.).
* [MNIST Demo](mnist.html): example of end-to-end training and testing on the MNIST data.
* [Training ImageNet](imagenet.html): tutorial on end-to-end training of an ImageNet classifier.
* [Training ImageNet](imagenet_training.html): tutorial on end-to-end training of an ImageNet classifier.
* [Running Pretrained ImageNet](imagenet_pretrained.html): simply runs in Python!
* [Running Detection](imagenet_detection.html): run a pretrained model as a detector.

Expand Down
Loading

0 comments on commit 9b2aca4

Please sign in to comment.