Skip to content

Latest commit

 

History

History
69 lines (40 loc) · 3.54 KB

Clf_arch.md

File metadata and controls

69 lines (40 loc) · 3.54 KB

This is quick evaluation of different classifier (fc6-fc8) designs performance on ImageNet-2012.

The architecture is similar to CaffeNet, but has differences:

  1. Images are resized to small side = 128 for speed reasons.
  2. fc6 and fc7 layers have 2048 neurons instead of 4096.
  3. Networks are initialized with LSUV-init

CLF architecture

Name Accuracy LogLoss Comments
Default ReLU 0.470 2.36 fc6 = conv 3x3x2048 -> fc7 2048 -> 1000 fc8
Conv5-fc6=2048C3_2048C1_clf_avg 0.494 2.34 no pool5 -> fc6 = conv 3x3x2048 -> fc7=conv 1x1x2048 -> fc8 as 1x1 conv -> ave_pool.
Pool5-fc6=2048C3_2048C1_avg_clf 0.489 2.28 no pool5 -> fc6 = conv 3x3x2048 -> fc7=conv 1x1x2048 -> ave_pool -> fc8
SPP2-FC-FC 0.471 2.36 pool5 = SPP with 2 levels (2x2 and 1x1) -> FC6 -> FC7
SPP3-FC-FC 0.483 2.30 pool5 = SPP with 3 levels (3x3 and 2x2 and 1x1) -> FC6 -> FC7
fc6=512C3_1024C3_1536C1 0.482 2.52 pool5 zero pad -> fc6 = conv 3x3x512 -> fc7=conv 3x3x1024 -> 1x1x1536 -> fc8 as 1x1 conv -> ave_pool.
fc6=512C3_1024C3_1536C1_drop 0.491 2.29 pool5 zero pad -> fc6 = conv 3x3x512 -> fc7=conv 3x3x1024 -> drop 0.3 -> 1x1x1536 -> drop 0.5-> fc8 as 1x1 conv -> ave_pool.
Default ReLU, 4096 0.497 2.24 fc6 = conv 3x3x4096 -> fc7 4096 -> 1000 fc8 == original caffenet

pool5pad following nets mistakenly were trained with ELU non-linearity instead of default ReLU

Name Accuracy LogLoss Comments
Default ELU 0.488 2.28 fc6 = conv 3x3x2048 -> fc7 2048 -> 1000 fc8
pool5pad_fc6ave 0.481 2.32 pool5 zero pad -> fc6 = conv 3x3x2048 -> AvePool -> as usual
pool5pad_fc6ave_fc7as1x1fc8ave 0.511 2.21 pool5 zero pad -> fc6 = conv 3x3x2048 -> fc7 as 1x1 conv -> fc8 as 1x1 conv -> ave_pool.
pool5pad_fc6ave_fc7as1x1avefc8 0.508 2.22 pool5 zero pad -> fc6 = conv 3x3x2048 -> fc7 as 1x1 conv -> ave_pool -> fc8
pool5pad_fc6ave_fc7as1x1_avemax_fc8 0.509 2.19 pool5 zero pad -> fc6 = conv 3x3x2048 -> fc7 as 1x1 conv -> fc8 as 1x1 conv -> ave_pool + max_pool.

Prototxt, logs

CaffeNet128 test accuracy

CaffeNet128 test loss

CaffeNet128 train loss

Squeezing representation

For example, for using activations in image retrieval.

Name Accuracy LogLoss Comments
pool5pad_fc6ave_fc7as1x1fc8ave 0.508 2.22 Baseline. pool5 zero pad -> fc6 = conv 3x3x2048 -> fc7 as 1x1 conv -> ave_pool -> fc8 as 1x1 conv.
pool5pad_fc6ave_fc7as1x1=512_fc8ave 0.489 2.30 fc7 as 1x1 conv = 512
pool5pad_fc6ave_fc7as1x1_bottleneck=512_fc8ave 0.490 2.28 fc7 as 1x1 conv = 2048 then fc7a = 512

Prototxt, logs

CaffeNet128 test accuracy

CaffeNet128 test loss

CaffeNet128 train loss

P.S. Logs are merged from lots of "save-resume", because were trained at nights, so plot "Accuracy vs. seconds" will give weird results.