-
Notifications
You must be signed in to change notification settings - Fork 491
Model Zoo
Check out the model zoo documentation for details.
To acquire a model:
- download the model gist by
./scripts/download_model_from_gist.sh <gist_id> <dirname>
to load the model metadata, architecture, solver configuration, and so on. (<dirname>
is optional and defaults to caffe/models). - download the model weights by
./scripts/download_model_binary.py <model_dir>
where<model_dir>
is the gist directory from the first step.
or visit the model zoo documentation for complete instructions.
-
Finetuning on Flickr Style: same as provided in
models/
, but listed here as a Gist for an example. - BVLC GoogleNet:
models/bvlc_googlenet
The Network in Network model is described in the following ICLR-2014 paper:
Network In Network
M. Lin, Q. Chen, S. Yan
International Conference on Learning Representations, 2014 (arXiv:1409.1556)
please cite the paper if you use the models.
Models:
- NIN-Imagenet: a small(29MB) model for imagenet, yet performs slightly better than AlexNet, and fast to train. (Note: a more caffe-compatible version with correct convolutional weights shape: https://drive.google.com/folderview?id=0B0IedYUunOQINEFtUi1QNWVhVVU&usp=drive_web)
- NIN-CIFAR10: NIN model on CIFAR10, originally published in the paper Network In Network. The error rate of this model is 10.4% on CIFAR10.
Models from the BMVC-2014 paper "Return of the Devil in the Details: Delving Deep into Convolutional Nets"
The models are trained on the ILSVRC-2012 dataset. The details can be found on the project page or in the following BMVC-2014 paper:
Return of the Devil in the Details: Delving Deep into Convolutional Nets
K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman
British Machine Vision Conference, 2014 (arXiv ref. cs1405.3531)
Please cite the paper if you use the models.
Models:
- VGG_CNN_S: 13.1% top-5 error on ILSVRC-2012-val
- VGG_CNN_M: 13.7% top-5 error on ILSVRC-2012-val
- VGG_CNN_M_2048: 13.5% top-5 error on ILSVRC-2012-val
- VGG_CNN_M_1024: 13.7% top-5 error on ILSVRC-2012-val
- VGG_CNN_M_128: 15.6% top-5 error on ILSVRC-2012-val
- VGG_CNN_F: 16.7% top-5 error on ILSVRC-2012-val
The models are the improved versions of the models used by the VGG team in the ILSVRC-2014 competition. The details can be found on the project page or in the following arXiv paper:
Very Deep Convolutional Networks for Large-Scale Image Recognition
K. Simonyan, A. Zisserman
arXiv:1409.1556
Please cite the paper if you use the models.
Models:
- 16-layer: 7.5% top-5 error on ILSVRC-2012-val, 7.4% top-5 error on ILSVRC-2012-test
- 19-layer: 7.5% top-5 error on ILSVRC-2012-val, 7.3% top-5 error on ILSVRC-2012-test
In the paper, the models are denoted as configurations D
and E
, trained with scale jittering.
The combination of the two models achieves 7.1% top-5 error on ILSVRC-2012-val, and 7.0% top-5 error on ILSVRC-2012-test.
Places CNN is described in the following NIPS 2014 paper:
B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva
Learning Deep Features for Scene Recognition using Places Database.
Advances in Neural Information Processing Systems 27 (NIPS) spotlight, 2014.
The project page is here
Models:
- Places205-AlexNet: CNN trained on 205 scene categories of Places Database (used in NIPS'14) with ~2.5 million images. The architecture is the same as Caffe reference network.
- Hybrid-CNN: CNN trained on 1183 categories (205 scene categories from Places Database and 978 object categories from the train data of ILSVRC2012 (ImageNet) with ~3.6 million images. The architecture is the same as Caffe reference network.
- Places205-GoogLeNet: GoogLeNet CNN trained on 205 scene categories of Places Database. It is used by Google in the deep dream visualization
We implemented GoogLeNet using a single GPU. Our main contribution is an effective way to initialize the network and a trick to overcome the GPU memory constraint by accumulating gradients over two training iterations.
- Please check http://vision.princeton.edu/pvt/GoogLeNet/ for more information. Pre-trained models on ImageNet and Places, and the training code are available for download.
- Make sure cls2_fc2 and cls3_fc have num_output = 1000 in the prototxt. Otherwise, the trained model would crash on test.
These models are described in the paper:
Fully Convolutional Models for Semantic Segmentation
Jonathan Long*, Evan Shelhamer*, Trevor Darrell
CVPR 2015
arXiv:1411.4038
Details, model definitions, pre-trained weights, and code are public on github: https://github.com/shelhamer/fcn.berkeleyvision.org.
These models are compatible with Caffe master, unlike earlier FCNs that required a pre-release branch (note: this reference edition of the models is still in progress and not all of the models have yet been ported to master). The models are available under the same license as the Caffe-bundled models (i.e., for unrestricted use; see http://caffe.berkeleyvision.org/model_zoo.html#bvlc-model-license).
https://gist.github.com/jimgoo/0179e52305ca768a601f
The is the reference CaffeNet (modified AlexNet) fine-tuned for the Oxford 102 category flower dataset. The number of outputs in the inner product layer has been set to 102 to reflect the number of flower categories. Hyperparameter choices reflect those in Fine-tuning CaffeNet for Style Recognition on “Flickr Style” Data. The global learning rate is reduced while the learning rate for the final fully connected is increased relative to the other layers.
After 50,000 iterations, the top-1 error is 7% on the test set of 1,020 images.
I0215 15:28:06.417726 6585 solver.cpp:246] Iteration 50000, loss = 0.000120038
I0215 15:28:06.417789 6585 solver.cpp:264] Iteration 50000, Testing net (#0)
I0215 15:28:30.834987 6585 solver.cpp:315] Test net output #0: accuracy = 0.9326
I0215 15:28:30.835072 6585 solver.cpp:251] Optimization Done.
I0215 15:28:30.835083 6585 caffe.cpp:121] Optimization Done.
CNN models described in the following CVPR'15 papger "Salient Object Subitizing":
Salient Object Subitizing
J. Zhang, S. Ma, M. Sameki, S. Sclaroff, M. Betke, Z. Lin, X. Shen, B. Price and R. Mech.
CVPR, 2015.
Models:
- AlexNet: CNN model finetuned on the Salient Object Subitizing dataset (~5500 images). The architecture is the same as the Caffe reference network.
- VGG16: CNN model finetuned on the Salient Object Subitizing dataset (~5500 images). The architecture is the same as the VGG16 network. This model gives better performance than the AlexNet model, but is slower for training and testing.
We present an effective deep learning framework to create the hash-like binary codes for fast image retrieval. The details can be found in the following "CVPRW'15 paper":
Deep Learning of Binary Hash Codes for Fast Image Retrieval
K. Lin, H.-F. Yang, J.-H. Hsiao, C.-S. Chen
CVPR 2015, DeepVision workshop
please cite the paper if you use the model:
- caffe-cvprw15: See our code release on Github, which allows you to train your own deep hashing model and create binary hash codes.
- CIFAR10-48bit: Proposed 48-bits CNN model trained on CIFAR10.
- Places-CNDS-8 is a "8conv3fc layer" deep Convolutional neural Networks model trained on MIT Places Dataset with Deep Supervision.
The details of training this model are described in the following report. Please cite this work if the model is useful for you.
Training Deeper Convolutional Networks with Deep Supervision
L.Wang, C.Lee, Z.Tu, S. Lazebnik, arXiv:1505.02496, 2015
- Age/Gender.net are models for age and gender classification trained on the Adience-OUI dataset. See the Project page.
The models are described in the following paper:
Age and Gender Classification using Convolutional Neural Networks
Gil Levi and Tal Hassner
IEEE Workshop on Analysis and Modeling of Faces and Gestures (AMFG),
at the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, June 2015
If you find our models useful, please add suitable reference to our paper in your work.
GoogLeNet_cars is the GoogLeNet model pre-trained on ImageNet classification task and fine-tuned on 431 car models in CompCars dataset. It is described in the technical report. Please cite the following work if the model is useful for you.
A Large-Scale Car Dataset for Fine-Grained Categorization and Verification
L. Yang, P. Luo, C. C. Loy, X. Tang, arXiv:1506.08959, 2015
These models are described in the paper:
ParseNet: Looking Wider to See Better
Wei Liu, Andrew Rabinovich, Alexander C. Berg
arXiv:1506.04579
To be able to train/eval ParseNet, you can refer to http://github.com/weiliu89/caffe/tree/fcn.
Modified VGGNet used to fine-tune ParseNet:
Models trained on PASCAL (using extra data from Hariharan et al. and finetuned from the fully convolutional reduced VGGNet):
SegNet is a real-time semantic segmentation architecture for scene understanding. Code and trained models for SegNet and Bayesian SegNet are available.
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
Vijay Badrinarayanan, Alex Kendall and Roberto Cipolla
arXiv preprint arXiv:1511.00561, 2015.
Code (with Matlab/Python API) and model are described in the ICCV 2015 paper
Conditional Random Fields as Recurrent Neural Networks
S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, P. Torr
ICCV 2015.
Model is trained on Microsoft COCO and PASCAL (using extra data from Hariharan et al. and finetuned from the FCN-8s):
The model and code provided are described in the ICCV 2015 paper:
Holistically-Nested Edge Detection
Saining Xie and Zhuowen Tu
ICCV 2015
For details about training/evaluating HED, please take a look at http://github.com/s9xie/hed.
Model trained on BSDS-500 Dataset (finetuned from the VGGNet):
###Translating Videos to Natural Language
These models are described in this NAACL-HLT 2015 paper.
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, K. Saenko
NAACL-HLT 2015
More details can be found on this project page.
Model:
Video2Text_VGG_mean_pool:
This model is an improved version of the mean pooled model described in the NAACL-HLT 2015 paper. It uses video frame features from the
VGG-16 layer model. This is trained only on the Youtube video dataset.
Compatibility:
These are pre-release models. They do not run in any current version of BVLC/caffe, as they require unmerged PRs. The models are currently supported by the recurrent
branch of the Caffe fork provided at https://github.com/jeffdonahue/caffe/tree/recurrent and https://github.com/vsubhashini/caffe/tree/recurrent.
###VGG Face CNN descriptor
These models are described in this [BMVC 2015 paper] (http://www.robots.ox.ac.uk/~vgg/publications/2015/Parkhi15/parkhi15.pdf).
Deep Face Recognition
Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman
BMVC 2015
More details can be found on this project page.
Model: VGG Face: This is the very deep architecture based model trained from scratch using 2.6 Million images of celebrities collected from the web. The model has been imported to work with Caffe from the original model trained using MatConvNet library.
If you find our models useful, please add suitable reference to our paper in your work.
###Yearbook Photo Dating Model from the ICCV 2015 Extreme Imaging Workshop paper:
A Century of Portraits: Exploring the Visual Historical Record of American High School Yearbooks
Shiry Ginosar, Kate Rakelly, Brian Yin, Sarah Sachs, Alyosha Efros
ICCV Workshop 2015
Model and prototxt files: Yearbook
These models are described in the ICCV 2015 paper.
Constrained Convolutional Neural Networks for Weakly Supervised Segmentation
Deepak Pathak, Philipp Krähenbühl, Trevor Darrell
ICCV 2015
arXiv:1506.03648
These are pre-release models. They do not run in any current version of BVLC/caffe, as they require unmerged PRs. Full details, source code, models, prototxts are available here: CCNN.
We provide models for facial emotion classification for different image representation obtained using mapped binary patterns. See the Project page for more details.
The models are described in the following paper:
Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns
Gil Levi and Tal Hassner
Proc. ACM International Conference on Multimodal Interaction (ICMI), Seattle, Nov. 2015
If you find our models useful, please add suitable reference to our paper in your work.
We provide source code and model for article: Yue Wu and Tal Hassner, "Facial Landmark Detection with Tweaked Convolutional Neural Networks", arXiv preprint arXiv:1511.04031, 12 Nov. 2015. See project page for more information about this project.
Written by Ishay Tubi
This software is provided as is, without any warranty, with no legal constraints. If you find our models useful, please add suitable reference to our paper in your work.
Download pre-computed Faster R-CNN detectors
cd $FRCN_ROOT
./data/scripts/fetch_faster_rcnn_models.sh
This will populate the $FRCN_ROOT/data folder with faster_rcnn_models. See data/README.md for details. These models were trained on VOC 2007 trainval.
ref https://github.com/rbgirshick/py-faster-rcnn/blob/master/data/scripts/fetch_faster_rcnn_models.sh
###Sequence to Sequence - Video to Text
These models are described in this ICCV 2015 paper.
Sequence to Sequence - Video to Text
S. Venugopalan, M. Rohrbach, J. Donahue, T. Darrell, R. Mooney, K. Saenko
The IEEE International Conference on Computer Vision (ICCV) 2015
More details can be found on this project page.
Model:
S2VT_VGG_RGB:
This is the S2VT (RGB) model described in the ICCV 2015 paper. It uses video frame features from the VGG-16 layer model. This is trained only on the Youtube video dataset.
Compatibility:
These are pre-release models. They do not run in any current version of BVLC/caffe, as they require unmerged PRs. The models are currently supported by the recurrent
branch of the Caffe fork provided at https://github.com/jeffdonahue/caffe/tree/recurrent and https://github.com/vsubhashini/caffe/tree/recurrent.
This repository contains the original models (ResNet-50, ResNet-101, and ResNet-152) described in the paper "Deep Residual Learning for Image Recognition" (http://arxiv.org/abs/1512.03385). These models are those used in [ILSVRC] (http://image-net.org/challenges/LSVRC/2015/) and COCO 2015 competitions, which won the 1st places in: ImageNet classification, ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
More instructions with prototxt and binary weight files are in: https://github.com/KaimingHe/deep-residual-networks
Reference:
@article{He2015,
author = {Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun},
title = {Deep Residual Learning for Image Recognition},
journal = {arXiv preprint arXiv:1512.03385},
year = {2015}
}
This model has been used for the paper "Analyzing Classifiers: Fisher Vectors and Deep Neural Networks" (http://arxiv.org/abs/1512.00172), which is to appear in the proceedings of CVPR 2016.
Kindly note, that it has been trained in a multilabel setting with a multilabel-compatible loss function. It should not be used in conjunction with a softmax layer
In particular
Downloading the Model: caffemodel prototxt
Please reference the above submission when using the model via
@inproceedings{bach-cvpr16,
author = {Sebastian Bach and Alexander Binder and Gr{\'e}goire Montavon and Klaus-Robert M{\"u}ller and Wojciech Samek},
title = {Analyzing Classifiers: Fisher Vectors and Deep Neural Networks},
booktitle = {CVPR},
year = 2016,
organization = {IEEE}
}
@article{SqueezeNet,
Author = {Forrest N. Iandola and Matthew W. Moskewicz and Khalid Ashraf and Song Han and William J. Dally and Kurt Keutzer},
Title = {SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and $<$1MB model size},
Journal = {arXiv:1602.07360},
Year = {2016}
}
Please cite the paper if you use the model.
Model trained on ImageNet (including weights, solver, train_val, and deploy prototxt files)
- Error rate on ImageNet ILSVRC-2012 is better than or equal to the
bvlc_alexnet
model.
Mixture DCNN is a novel multi-model architecture which achieves better performance than an ensemble of DCNNs as evaluated on three different fine-grained datasets. Please cite the following paper if you use these models in your research.
@inproceedings{GeWACV2016,
author = {ZongYuan Ge and Alex Bewley and Christopher McCool and Ben Upcroft and Peter Corke and Conrad Sanderson},
title = {Fine-Grained Classification via Mixture of Deep Convolutional Neural Networks},
booktitle = {Winter Conference on the Applications of Computer Vision (WACV)},
publisher = {IEEE},
year = {2016}
}
CNN models for the following CVPR'16 paper:
Unconstrained Salient Object Detection via Proposal Subset Optimization
J. Zhang, S. Sclaroff, Z. Lin, X. Shen, B. Price and R. Mech.
CVPR, 2016.
The following models are finetuned on the Salient Object Subitizing dataset (~5000 images) with bounding box annotations:
- VGG16: This model is used in the paper.
- GoogleNet: This model is smaller, faster and slightly better than the VGG16 model.
It is recommended that you download the full system here, which will automatically download all the needed models and data.