Added Performance and Hardware Tips

sijinli · Apr 2, 2014 · f3bf7cf · f3bf7cf
1 parent c24a033
commit f3bf7cf
Show file tree

Hide file tree

Showing 2 changed files with 44 additions and 2 deletions.
diff --git a/docs/index.md b/docs/index.md
@@ -16,10 +16,10 @@ Caffe aims to provide computer vision scientists and practitioners with a **clea
 For example, network structure is easily specified in separate config files, with no mess of hard-coded parameters in the code.
 
 At the same time, Caffe fits industry needs, with blazing fast C++/CUDA code for GPU computation.
-Caffe is currently the fastest GPU CNN implementation publicly available, and is able to process more than **20 million images per day** on a single Tesla K20 machine \*.
+Caffe is currently the fastest GPU CNN implementation publicly available, and is able to process more than **40 million images per day** with a single K40 or Titan NVidia Card (20 million images per day on a single Tesla K20 NVidia Card)\*. Currently, caffe can process 192 images per second during training and 500 images per second during test (using K40 or Titan) \*.
 
 Caffe also provides **seamless switching between CPU and GPU**, which allows one to train models with fast GPUs and then deploy them on non-GPU clusters with one line of code: `Caffe::set_mode(Caffe::CPU)`.
-Even in CPU mode, computing predictions on an image takes only 20 ms when images are processed in batch mode.
+Even in CPU mode, computing predictions on an image takes only 20 ms when images are processed in batch mode. While in GPU mode, computing predictions on an image takes only 2 ms when images are processed in batch mode.
 
 ## Documentation
 

diff --git a/docs/performance_hardware.md b/docs/performance_hardware.md
@@ -0,0 +1,42 @@
+---
+layout: default
+title: Caffe
+---
+
+# Performance, Hardware tips
+
+To measure the performance of different Nvidia cards we use the reference imagenet model provided in Caffe.
+
+## K40 Nvidia \*
+
+### With ECC on
+
+K40 ecc on max speed 26.7 secs / 20 training iterations (256*20 images), 101 secs / validation test (50000 images)
+K40 ecc on default speed 31 secs / 20 training iterations (256*20 images), 117 secs / validation test (50000 images)
+
+### With ECC off
+
+K40 ecc off max speed 26.5 secs / 20 training iterations (256*20 images), 100 secs / validation test (50000 images)
+K40 ecc off default speed 31 secs / 20 training iterations (256*20 images), 118 secs / validation test (50000 images)
+
+### K40 Performance tip
+
+To get the maximum performance of K40 NVidia one can adjust clock speed and dissable ecc (at your own risk).
+
+To turn off ECC and reboot
+	sudo nvidia-smi -e 0
+Active permance flag
+	sudo nvidia-smi -pm 1
+and then set clocks speed
+	sudo nvidia-smi -i 0 -ac 3004,875 
+
+
+## Titan Nvidia \*
+
+Titan 26.26 secs / 20 training iterations (256*20 images), 100 secs / validation test (50000 images)
+
+## K20 Nvidia \*
+
+Titan 36.0 secs / 20 training iterations (256*20 images), 133 secs / validation test (50000 images)
+
+\* BVLC members are very gratefull to Nvidia for providing several GPU cards for conducting this research.