-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DenseNet feature pyramid computation #308
Conversation
…and its python iface
…ors (as Sergey suggested)
…ew data type is dict['feat'] = list of feature scales
-- note: this commit almost certainly breaks compilation of matcaffe. mwm
…caffe output in matlab wrapper. mwm fix matlab vs octave compile stuff for matlab pyramid API
…-> caffe/src/stitch_pyramid
…to mean; corner regions still TODO
…rk; other minor changes. -- fill corners of padding with interpolation of edge padding (which is in turn already an interpolation from the image edge to the imagenet mean). -- add .PHONY stitch target to top level makefile for building just libStitchPyramid -- minor fix to testing makefile: add a -L. to src/stitch_pyramid/build/Makefile -- update matcaffe.cpp comment with current mkoctfile-based build command (can be used to test building matcaffe under octave) -- condense str() in featpyramid_common.hpp (str() is for debugging printfs) -- add a copy of str() in JPEDPyramid.cpp (FIXME: use some common copy?) mwm
…ject*) casts (to fix compile errors for some build envs). note also that there is a new SHARED_LDFLAGS makefile var that can be set to include -Wl,--no-undefined (for gcc) to avoid accidentally linking an .so missing some of its dependencies. but, since people use other compilers, it's not enabled by default, and only the stitch library build line uses the macro at this point. mwm
…on/license note there too.
Added some DenseNet API documentation. This will probably percolate from this top-level README.md to the caffe.berkeleyvision.org gh-pages.
I've read the paper and most of the codes. It seems that there would be quite a lot of work to do to replace the external codes. The winner of the Fine-Grained Challenge 2013 held along with the ILSVRC2013 was still using Fisher Vector which aggregated the dense sift local features. @moskewcz, is it possible to use the DenseNet to extract similar dense local features? And if so, how? |
in short, i think the answer is no, at least not in the current code or in the BSD replacement. but, i'm not quite sure what you're asking. in principle, if you have any dense feature, and a way to compute it on an image, then you can use an image pyramid to compute it across scales. modulo alignment and edge effects, you can also use a stitched image pyramid, if you want to or need to for some reason. i guess it's a pretty standard technique? i'm still slowly chipping away at replacing all the GPL code and integrating the result into something with roughly equivalent functionality to the existing GPL DenseNet impl: i'd guess that for either the GPL/ffld code or the BSD version (when it is done) of the DenseNet code it wouldn't be hard to modify it to compute dense multiscale descriptor pyramids for some arbitrary feature(s). it's just a matter of having a function to compute the desired feature(s) on a image and various (admittedly potentially non-trivial) feature-dependant glue issues: padding, alignment, mapping. i'm not sure exactly what the value of that would be, though: i'd think there are other existing ways to go from image -> image pyramid -> feature pyramid for various dense features in C++ (or at least from matlab/python). but maybe not, i dunno. |
Because this work focuses on constructing multiscale image and feature pyramids instead of extracting dense local features of a single scale, maybe DenseNet is better called PyramidNet. The motivation to extract such dense local features is fine grained object classification. I fine-tuned the caffe reference ImageNet model on 23 knife ImageNet synsets and the test accuracy was only about 50%. But the winning results of the Fine-Grained Challenge 2013 were much better although the object categories were likely to be easier to classify. It's surprising that the winner that did not utilize deep neural network beat those who did. There are three possible reasons. First, the features of the last convolutional layer have too large receptive fields and are not dense enough to capture the fine grained local information. Second, the fully connected layers do not keep as much statistics of the local features as the Fisher Vectors do. Third, the winner ensembled two complementary Fisher Vector models with quite different design choices and parameters. As you have pointed out, the solution to the first issue involves non-trivial processing. But the output features of the last convolutional layer at each position can be used as the not so dense local features. The second problem can be mitigated by encoding the local features with the Fisher Vector or the VLAD using the open source implementation of VLFEAT. Finally, the features of multiple different CNN architectures combined together are superior than the features of a single net. I have just figured out the outline of this scheme and only the experimental results can tell us whether it works or not. |
A recent performance evaluation (#557) indicates that it's impossible for the Fisher Vector to beat CNN. |
@kloudkl There are also experiments with caffe and VLAD: |
It's strange since the Fisher Vector used to be more effective than VLAD. |
4278286
to
c01f07a
Compare
Hi, when DenseNet will be merged to dev branch? As it works, I assume little work is need to merge it. |
Thanks Forrest @forresti and Matt @moskewcz for posting the code for your tech report. Since the PR has been made, others can find this snapshot of the implementation. However, this cannot be merged. There are the steps helpfully listed by Matt, but more fundamentally there was and still is the goodness of fit. As implemented this was essentially done independent of the framework and then attached. This shows in the 5000+ line diff. Caffe is capable of dense inference learning -- blobs can take the necessary shapes, convolution is convolution and matrix multiplications can be cast to convolution, and losses can take different input and truth shapes. (The editing model parameters example shows how to make a model fully convolutional for inference.) The contribution of this PR is to assemble pyramids and show how to do extraction. This seems like it could be done the most clearly and effectively as an example instead of through further code. For instance (1) the stitching approach could be turned into a data layer or (2) each level of the pyramid could be an input to a weight-shared pyramid net. At its simplest this could be a Python data layer that need not even be slow if one does multi-processing for prefetching. As a pointer to dense models and matrix output follow #1698. Closing. |
DenseNet PR current state / TODOs (for discussion)
explanatory tech report:
DenseNet: Implementing Efficient ConvNet Descriptor Pyramids
Forrest Iandola, Matt Moskewicz, Sergey Karayev, Ross Girshick, Trevor Darrell, and Kurt Keutzer
additional integration notes (see main notes below inlined from the DENSENET_MERGE_TODO file)
inlined from: DENSENET_MERGE_TODO
list of the issues blocking the merge of the DenseNet feature branch: