Skip to content

Commit

Permalink
Added Performance and Hardware Tips
Browse files Browse the repository at this point in the history
  • Loading branch information
sguada committed Apr 2, 2014
1 parent c24a033 commit f3bf7cf
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ Caffe aims to provide computer vision scientists and practitioners with a **clea
For example, network structure is easily specified in separate config files, with no mess of hard-coded parameters in the code.

At the same time, Caffe fits industry needs, with blazing fast C++/CUDA code for GPU computation.
Caffe is currently the fastest GPU CNN implementation publicly available, and is able to process more than **20 million images per day** on a single Tesla K20 machine \*.
Caffe is currently the fastest GPU CNN implementation publicly available, and is able to process more than **40 million images per day** with a single K40 or Titan NVidia Card (20 million images per day on a single Tesla K20 NVidia Card)\*. Currently, caffe can process 192 images per second during training and 500 images per second during test (using K40 or Titan) \*.

Caffe also provides **seamless switching between CPU and GPU**, which allows one to train models with fast GPUs and then deploy them on non-GPU clusters with one line of code: `Caffe::set_mode(Caffe::CPU)`.
Even in CPU mode, computing predictions on an image takes only 20 ms when images are processed in batch mode.
Even in CPU mode, computing predictions on an image takes only 20 ms when images are processed in batch mode. While in GPU mode, computing predictions on an image takes only 2 ms when images are processed in batch mode.

## Documentation

Expand Down
42 changes: 42 additions & 0 deletions docs/performance_hardware.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
layout: default
title: Caffe
---

# Performance, Hardware tips

To measure the performance of different Nvidia cards we use the reference imagenet model provided in Caffe.

## K40 Nvidia \*

### With ECC on

K40 ecc on max speed 26.7 secs / 20 training iterations (256*20 images), 101 secs / validation test (50000 images)
K40 ecc on default speed 31 secs / 20 training iterations (256*20 images), 117 secs / validation test (50000 images)

### With ECC off

K40 ecc off max speed 26.5 secs / 20 training iterations (256*20 images), 100 secs / validation test (50000 images)
K40 ecc off default speed 31 secs / 20 training iterations (256*20 images), 118 secs / validation test (50000 images)

### K40 Performance tip

To get the maximum performance of K40 NVidia one can adjust clock speed and dissable ecc (at your own risk).

To turn off ECC and reboot
sudo nvidia-smi -e 0
Active permance flag
sudo nvidia-smi -pm 1
and then set clocks speed
sudo nvidia-smi -i 0 -ac 3004,875


## Titan Nvidia \*

Titan 26.26 secs / 20 training iterations (256*20 images), 100 secs / validation test (50000 images)

## K20 Nvidia \*

Titan 36.0 secs / 20 training iterations (256*20 images), 133 secs / validation test (50000 images)

\* BVLC members are very gratefull to Nvidia for providing several GPU cards for conducting this research.

0 comments on commit f3bf7cf

Please sign in to comment.