Merge branch 'master' of github.com:albanie/mcnSENets

albanie · Sep 8, 2017 · ce7dcd2 · ce7dcd2
2 parents 1bef5fc + fd9467b
commit ce7dcd2
Show file tree

Hide file tree

Showing 15 changed files with 184 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -18,9 +18,35 @@ Each of the Squeeze-and-Excitation networks released by the authors has been imp
 
 [SE Networks](http://www.robots.ox.ac.uk/~albanie/models.html#se-models)
 
-The `run_se_benchmarks.m` script will evaluate each of these models on the ImageNet validation set. It will download the models automatically if you have not already done so (note that these evaluations require a copy of the imagenet data).  
+The `run_se_benchmarks.m` script will evaluate each of these models on the ImageNet validation set. It will download the models automatically if you have not already done so (note that these evaluations require a copy of the imagenet data).  The results of the evaluations are given below - note there are minor differences to the original scores (listed under `official`) due to variations in preprocessing (full details of the evaluation can be found [here](http://www.robots.ox.ac.uk/~albanie/models.html#se-models)):
 
-The result of this evaluation for each pretrained model can also be found [here](http://www.robots.ox.ac.uk/~albanie/models.html#se-models). 
+
+| model	  | top-1 error (offical)	| top-5 error (official) |
+|---------------------------|-------------------------|------------------------|
+| SE-ResNet-50-mcn	        | 22.30 (22.37) | 6.30  (6.36) |
+| SE-ResNet-101-mcn	        | 21.59 (21.75) | 5.81  (5.72) |
+| SE-ResNet-152-mcn	        | 21.38 (21.34) | 5.60  (5.54) |
+| SE-BN-Inception-mcn       | 24.16 (23.62) | 7.35  (7.04) |
+| SE-ResNeXt-50-32x4d-mcn   | 21.01 (20.97) | 5.58  (5.54) |
+| SE-ResNeXt-101-32x4d-mcn  | 19.73 (19.81) | 4.98  (4.96) |
+| SENet-mcn	                | 18.67 (18.68) | 4.50  (4.47) |
+
+There may be some difference in how the Inception network should be preprocessed relative to the others (this model exhibits a noticeable degradation). To give some idea of the relative computational burdens of each model, esimates are provided below:
+
+
+| model | input size | param memory | feature memory | flops |
+|-------|------------|--------------|----------------|-------|
+| [SE-ResNet-50](reports/SE-ResNet-50.md) | 224 x 224 | 107 MB | 103 MB | 4 GFLOPs                |
+| [SE-ResNet-101](reports/SE-ResNet-101.md) | 224 x 224 | 189 MB | 155 MB | 8 GFLOPs              |
+| [SE-ResNet-152](reports/SE-ResNet-152.md) | 224 x 224 | 255 MB | 220 MB | 11 GFLOPs             |
+| [SE-BN-Inception](reports/SE-BN-Inception.md) | 224 x 224 | 46 MB | 43 MB | 2 GFLOPs            |
+| [SE-ResNeXt-50-32x4d](reports/SE-ResNeXt-50-32x4d.md) | 224 x 224 | 105 MB | 132 MB | 4 GFLOPs  |
+| [SE-ResNeXt-101-32x4d](reports/SE-ResNeXt-101-32x4d.md) | 224 x 224 | 187 MB | 197 MB | 8 GFLOPs|
+| [SENet](reports/SENet.md) | 224 x 224 | 440 MB | 347 MB | 21 GFLOPs                             |
+
+
+Each estimate corresponds to computing a single element batch. This table was generated
+with [convnet-burden](https://github.com/albanie/convnet-burden) - the repo has a list of the assumptions used produce estimations. Clicking on the model name should give a more detailed breakdown.
 
 
 ### Installation
@@ -34,4 +60,10 @@ vl_contrib('setup', 'mcnSENets') ;
 vl_contrib('test', 'mcnSENets') ; % optional
 ```
 
-**Note:** The ordering of the imagenet labels differs from the standard ordering commonly found in caffe, pytorch etc.  These are remapped automically in the evaluation code.  The mapping between the synsets indices can be found [here](misc/label_map.txt).
+### Dependencies
+
+This code uses the **autonn** wrapper for MatConvNet, which can also be installed with `vl_contrib` (instructions [here](https://github.com/vlfeat/autonn)).
+
+### Notes
+
+The ordering of the imagenet labels differs from the standard ordering commonly found in caffe, pytorch etc.  These are remapped automically in the evaluation code.  The mapping between the synsets indices can be found [here](misc/label_map.txt).
diff --git a/reports/SE-BN-Inception.md b/reports/SE-BN-Inception.md
@@ -0,0 +1,17 @@
+### Report for SE-BN-Inception
+Model params 46 MB 
+
+Estimates for a single full pass of model at input size 224 x 224: 
+
+* Memory required for features: 43 MB 
+* Flops: 2 GFLOPs 
+
+Estimates are given below of the burden of computing the `inception_5b_scale` features in the network for different input sizes using a batch size of 128: 
+
+| input size | feature size | feature memory | flops | 
+|------------|--------------|----------------|-------| 
+| 224 x 224 | 7 x 7 x 1024 | 5 GB | 262 GFLOPs |
+
+A rough outline of where in the network memory is allocated to parameters and features and where the greatest computational cost lies is shown below.  The x-axis does not show labels (it becomes hard to read for networks containing hundreds of layers) - it should be interpreted as depicting increasing depth from left to right.  The goal is simply to give some idea of the overall profile of the model: 
+
+![SE-BN-Inception profile](figs/SE-BN-Inception.png)
diff --git a/reports/SE-ResNeXt-101-32x4d.md b/reports/SE-ResNeXt-101-32x4d.md
@@ -0,0 +1,22 @@
+### Report for SE-ResNeXt-101-32x4d
+Model params 187 MB 
+
+Estimates for a single full pass of model at input size 224 x 224: 
+
+* Memory required for features: 197 MB 
+* Flops: 8 GFLOPs 
+
+Estimates are given below of the burden of computing the `conv5_3` features in the network for different input sizes using a batch size of 128: 
+
+| input size | feature size | feature memory | flops | 
+|------------|--------------|----------------|-------| 
+| 112 x 112 | 4 x 4 x 2048 | 6 GB | 264 GFLOPs |
+| 224 x 224 | 7 x 7 x 2048 | 25 GB | 1 TFLOPs |
+| 336 x 336 | 11 x 11 x 2048 | 56 GB | 2 TFLOPs |
+| 448 x 448 | 14 x 14 x 2048 | 98 GB | 4 TFLOPs |
+| 560 x 560 | 18 x 18 x 2048 | 154 GB | 6 TFLOPs |
+| 672 x 672 | 21 x 21 x 2048 | 221 GB | 9 TFLOPs |
+
+A rough outline of where in the network memory is allocated to parameters and features and where the greatest computational cost lies is shown below.  The x-axis does not show labels (it becomes hard to read for networks containing hundreds of layers) - it should be interpreted as depicting increasing depth from left to right.  The goal is simply to give some idea of the overall profile of the model: 
+
+![SE-ResNeXt-101-32x4d profile](figs/SE-ResNeXt-101-32x4d.png)
diff --git a/reports/SE-ResNeXt-50-32x4d.md b/reports/SE-ResNeXt-50-32x4d.md
@@ -0,0 +1,22 @@
+### Report for SE-ResNeXt-50-32x4d
+Model params 105 MB 
+
+Estimates for a single full pass of model at input size 224 x 224: 
+
+* Memory required for features: 132 MB 
+* Flops: 4 GFLOPs 
+
+Estimates are given below of the burden of computing the `conv5_3` features in the network for different input sizes using a batch size of 128: 
+
+| input size | feature size | feature memory | flops | 
+|------------|--------------|----------------|-------| 
+| 112 x 112 | 4 x 4 x 2048 | 4 GB | 144 GFLOPs |
+| 224 x 224 | 7 x 7 x 2048 | 16 GB | 547 GFLOPs |
+| 336 x 336 | 11 x 11 x 2048 | 37 GB | 1 TFLOPs |
+| 448 x 448 | 14 x 14 x 2048 | 66 GB | 2 TFLOPs |
+| 560 x 560 | 18 x 18 x 2048 | 103 GB | 3 TFLOPs |
+| 672 x 672 | 21 x 21 x 2048 | 148 GB | 5 TFLOPs |
+
+A rough outline of where in the network memory is allocated to parameters and features and where the greatest computational cost lies is shown below.  The x-axis does not show labels (it becomes hard to read for networks containing hundreds of layers) - it should be interpreted as depicting increasing depth from left to right.  The goal is simply to give some idea of the overall profile of the model: 
+
+![SE-ResNeXt-50-32x4d profile](figs/SE-ResNeXt-50-32x4d.png)
diff --git a/reports/SE-ResNet-101.md b/reports/SE-ResNet-101.md
@@ -0,0 +1,22 @@
+### Report for SE-ResNet-101
+Model params 189 MB 
+
+Estimates for a single full pass of model at input size 224 x 224: 
+
+* Memory required for features: 155 MB 
+* Flops: 8 GFLOPs 
+
+Estimates are given below of the burden of computing the `conv5_3` features in the network for different input sizes using a batch size of 128: 
+
+| input size | feature size | feature memory | flops | 
+|------------|--------------|----------------|-------| 
+| 112 x 112 | 4 x 4 x 2048 | 5 GB | 252 GFLOPs |
+| 224 x 224 | 7 x 7 x 2048 | 19 GB | 977 GFLOPs |
+| 336 x 336 | 11 x 11 x 2048 | 44 GB | 2 TFLOPs |
+| 448 x 448 | 14 x 14 x 2048 | 77 GB | 4 TFLOPs |
+| 560 x 560 | 18 x 18 x 2048 | 121 GB | 6 TFLOPs |
+| 672 x 672 | 21 x 21 x 2048 | 174 GB | 9 TFLOPs |
+
+A rough outline of where in the network memory is allocated to parameters and features and where the greatest computational cost lies is shown below.  The x-axis does not show labels (it becomes hard to read for networks containing hundreds of layers) - it should be interpreted as depicting increasing depth from left to right.  The goal is simply to give some idea of the overall profile of the model: 
+
+![SE-ResNet-101 profile](figs/SE-ResNet-101.png)
diff --git a/reports/SE-ResNet-152.md b/reports/SE-ResNet-152.md
@@ -0,0 +1,22 @@
+### Report for SE-ResNet-152
+Model params 255 MB 
+
+Estimates for a single full pass of model at input size 224 x 224: 
+
+* Memory required for features: 220 MB 
+* Flops: 11 GFLOPs 
+
+Estimates are given below of the burden of computing the `conv5_3` features in the network for different input sizes using a batch size of 128: 
+
+| input size | feature size | feature memory | flops | 
+|------------|--------------|----------------|-------| 
+| 112 x 112 | 4 x 4 x 2048 | 7 GB | 372 GFLOPs |
+| 224 x 224 | 7 x 7 x 2048 | 27 GB | 1 TFLOPs |
+| 336 x 336 | 11 x 11 x 2048 | 62 GB | 3 TFLOPs |
+| 448 x 448 | 14 x 14 x 2048 | 110 GB | 6 TFLOPs |
+| 560 x 560 | 18 x 18 x 2048 | 171 GB | 9 TFLOPs |
+| 672 x 672 | 21 x 21 x 2048 | 246 GB | 13 TFLOPs |
+
+A rough outline of where in the network memory is allocated to parameters and features and where the greatest computational cost lies is shown below.  The x-axis does not show labels (it becomes hard to read for networks containing hundreds of layers) - it should be interpreted as depicting increasing depth from left to right.  The goal is simply to give some idea of the overall profile of the model: 
+
+![SE-ResNet-152 profile](figs/SE-ResNet-152.png)
diff --git a/reports/SE-ResNet-50.md b/reports/SE-ResNet-50.md
@@ -0,0 +1,22 @@
+### Report for SE-ResNet-50
+Model params 107 MB 
+
+Estimates for a single full pass of model at input size 224 x 224: 
+
+* Memory required for features: 103 MB 
+* Flops: 4 GFLOPs 
+
+Estimates are given below of the burden of computing the `conv5_3` features in the network for different input sizes using a batch size of 128: 
+
+| input size | feature size | feature memory | flops | 
+|------------|--------------|----------------|-------| 
+| 112 x 112 | 4 x 4 x 2048 | 3 GB | 132 GFLOPs |
+| 224 x 224 | 7 x 7 x 2048 | 13 GB | 499 GFLOPs |
+| 336 x 336 | 11 x 11 x 2048 | 29 GB | 1 TFLOPs |
+| 448 x 448 | 14 x 14 x 2048 | 51 GB | 2 TFLOPs |
+| 560 x 560 | 18 x 18 x 2048 | 80 GB | 3 TFLOPs |
+| 672 x 672 | 21 x 21 x 2048 | 115 GB | 4 TFLOPs |
+
+A rough outline of where in the network memory is allocated to parameters and features and where the greatest computational cost lies is shown below.  The x-axis does not show labels (it becomes hard to read for networks containing hundreds of layers) - it should be interpreted as depicting increasing depth from left to right.  The goal is simply to give some idea of the overall profile of the model: 
+
+![SE-ResNet-50 profile](figs/SE-ResNet-50.png)
diff --git a/reports/SENet.md b/reports/SENet.md
@@ -0,0 +1,22 @@
+### Report for SENet
+Model params 440 MB 
+
+Estimates for a single full pass of model at input size 224 x 224: 
+
+* Memory required for features: 347 MB 
+* Flops: 21 GFLOPs 
+
+Estimates are given below of the burden of computing the `conv5_3` features in the network for different input sizes using a batch size of 128: 
+
+| input size | feature size | feature memory | flops | 
+|------------|--------------|----------------|-------| 
+| 112 x 112 | 4 x 4 x 2048 | 11 GB | 684 GFLOPs |
+| 224 x 224 | 7 x 7 x 2048 | 43 GB | 3 TFLOPs |
+| 336 x 336 | 11 x 11 x 2048 | 98 GB | 6 TFLOPs |
+| 448 x 448 | 14 x 14 x 2048 | 173 GB | 11 TFLOPs |
+| 560 x 560 | 18 x 18 x 2048 | 271 GB | 17 TFLOPs |
+| 672 x 672 | 21 x 21 x 2048 | 390 GB | 24 TFLOPs |
+
+A rough outline of where in the network memory is allocated to parameters and features and where the greatest computational cost lies is shown below.  The x-axis does not show labels (it becomes hard to read for networks containing hundreds of layers) - it should be interpreted as depicting increasing depth from left to right.  The goal is simply to give some idea of the overall profile of the model: 
+
+![SENet profile](figs/SENet.png)
diff --git a/reports/figs/SE-BN-Inception.png b/reports/figs/SE-BN-Inception.png
diff --git a/reports/figs/SE-ResNeXt-101-32x4d.png b/reports/figs/SE-ResNeXt-101-32x4d.png
diff --git a/reports/figs/SE-ResNeXt-50-32x4d.png b/reports/figs/SE-ResNeXt-50-32x4d.png
diff --git a/reports/figs/SE-ResNet-101.png b/reports/figs/SE-ResNet-101.png
diff --git a/reports/figs/SE-ResNet-152.png b/reports/figs/SE-ResNet-152.png
diff --git a/reports/figs/SE-ResNet-50.png b/reports/figs/SE-ResNet-50.png
diff --git a/reports/figs/SENet.png b/reports/figs/SENet.png