Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DenseNet feature pyramid computation #308

Closed
wants to merge 92 commits into from

Conversation

moskewcz
Copy link

@moskewcz moskewcz commented Apr 8, 2014

DenseNet PR current state / TODOs (for discussion)

explanatory tech report:

DenseNet: Implementing Efficient ConvNet Descriptor Pyramids
Forrest Iandola, Matt Moskewicz, Sergey Karayev, Ross Girshick, Trevor Darrell, and Kurt Keutzer

additional integration notes (see main notes below inlined from the DENSENET_MERGE_TODO file)

inlined from: DENSENET_MERGE_TODO

list of the issues blocking the merge of the DenseNet feature branch:

  • critical
    • replacement of GPL'd code (including removal from history)
    • update build process / Makefile to match current practice
    • tests
    • trivial: remove DenseNet README.md header
    • trivial: remove this todo file
  • unclear neccessity, semantics, and/or priority
    • general cleanup of commit sequence (probably mostly squashing)
    • input interface changes (i.e. jpeg filname as input -> ?)
    • output interface changes (?, but probably something: image support size, multiple layer output, alignment, etc.)
    • if still reading image files after any iface changes and removal of GPL code, use XX instead of YY

forresti and others added 30 commits March 18, 2014 14:01
…ew data type is dict['feat'] = list of feature scales
-- note: this commit almost certainly breaks compilation of matcaffe.

mwm
…caffe output in matlab wrapper.

mwm

fix matlab vs octave compile stuff for matlab pyramid API
…rk; other minor changes.

-- fill corners of padding with interpolation of edge padding (which is in turn already an interpolation from the image edge to the imagenet mean).
-- add .PHONY stitch target to top level makefile for building just libStitchPyramid
-- minor fix to testing makefile: add a -L. to src/stitch_pyramid/build/Makefile
-- update matcaffe.cpp comment with current mkoctfile-based build command (can be used to test building matcaffe under octave)
-- condense str() in featpyramid_common.hpp (str() is for debugging printfs)
-- add a copy of str() in JPEDPyramid.cpp (FIXME: use some common copy?)

mwm
…ject*) casts (to fix compile errors for some build envs).

note also that there is a new SHARED_LDFLAGS makefile var that can be set to include -Wl,--no-undefined (for gcc) to avoid accidentally linking an .so missing some of its dependencies. but, since people use other compilers, it's not enabled by default, and only the stitch library build line uses the macro at this point.

mwm
moskewcz and others added 2 commits April 8, 2014 12:10
Added some DenseNet API documentation. This will probably percolate from this top-level README.md to the caffe.berkeleyvision.org gh-pages.
@forresti
Copy link
Contributor

Following up with people in #189 who are interested in sliding-window, dense, and multiscale CNN descriptors. Do you have any suggestions/feedback for this DenseNet PR?
@kloudkl @mavenlin @sguada @shelhamer @rodrigob

@kloudkl
Copy link
Contributor

kloudkl commented May 3, 2014

I've read the paper and most of the codes. It seems that there would be quite a lot of work to do to replace the external codes.

The winner of the Fine-Grained Challenge 2013 held along with the ILSVRC2013 was still using Fisher Vector which aggregated the dense sift local features. @moskewcz, is it possible to use the DenseNet to extract similar dense local features? And if so, how?

@moskewcz
Copy link
Author

moskewcz commented May 3, 2014

in short, i think the answer is no, at least not in the current code or in the BSD replacement. but, i'm not quite sure what you're asking.

in principle, if you have any dense feature, and a way to compute it on an image, then you can use an image pyramid to compute it across scales. modulo alignment and edge effects, you can also use a stitched image pyramid, if you want to or need to for some reason. i guess it's a pretty standard technique?

i'm still slowly chipping away at replacing all the GPL code and integrating the result into something with roughly equivalent functionality to the existing GPL DenseNet impl:
https://github.com/moskewcz/boda/commits/master

i'd guess that for either the GPL/ffld code or the BSD version (when it is done) of the DenseNet code it wouldn't be hard to modify it to compute dense multiscale descriptor pyramids for some arbitrary feature(s). it's just a matter of having a function to compute the desired feature(s) on a image and various (admittedly potentially non-trivial) feature-dependant glue issues: padding, alignment, mapping.

i'm not sure exactly what the value of that would be, though: i'd think there are other existing ways to go from image -> image pyramid -> feature pyramid for various dense features in C++ (or at least from matlab/python). but maybe not, i dunno.

@kloudkl
Copy link
Contributor

kloudkl commented May 4, 2014

Because this work focuses on constructing multiscale image and feature pyramids instead of extracting dense local features of a single scale, maybe DenseNet is better called PyramidNet.

The motivation to extract such dense local features is fine grained object classification. I fine-tuned the caffe reference ImageNet model on 23 knife ImageNet synsets and the test accuracy was only about 50%. But the winning results of the Fine-Grained Challenge 2013 were much better although the object categories were likely to be easier to classify. It's surprising that the winner that did not utilize deep neural network beat those who did.

There are three possible reasons. First, the features of the last convolutional layer have too large receptive fields and are not dense enough to capture the fine grained local information. Second, the fully connected layers do not keep as much statistics of the local features as the Fisher Vectors do. Third, the winner ensembled two complementary Fisher Vector models with quite different design choices and parameters.

As you have pointed out, the solution to the first issue involves non-trivial processing. But the output features of the last convolutional layer at each position can be used as the not so dense local features. The second problem can be mitigated by encoding the local features with the Fisher Vector or the VLAD using the open source implementation of VLFEAT. Finally, the features of multiple different CNN architectures combined together are superior than the features of a single net.

I have just figured out the outline of this scheme and only the experimental results can tell us whether it works or not.

This was referenced Jun 27, 2014
@kloudkl
Copy link
Contributor

kloudkl commented Jul 2, 2014

A recent performance evaluation (#557) indicates that it's impossible for the Fisher Vector to beat CNN.
[1] K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman. Return of the Devil in the Details: Delving Deep into Convolutional Nets. http://arxiv.org/abs/1405.3531

@bhack
Copy link
Contributor

bhack commented Jul 2, 2014

@kloudkl There are also experiments with caffe and VLAD:
http://arxiv.org/abs/1403.1840v1

@kloudkl
Copy link
Contributor

kloudkl commented Jul 3, 2014

It's strange since the Fisher Vector used to be more effective than VLAD.

@melgor
Copy link

melgor commented Oct 27, 2014

Hi, when DenseNet will be merged to dev branch? As it works, I assume little work is need to merge it.

@shelhamer
Copy link
Member

Thanks Forrest @forresti and Matt @moskewcz for posting the code for your tech report. Since the PR has been made, others can find this snapshot of the implementation.

However, this cannot be merged. There are the steps helpfully listed by Matt, but more fundamentally there was and still is the goodness of fit. As implemented this was essentially done independent of the framework and then attached. This shows in the 5000+ line diff.

Caffe is capable of dense inference learning -- blobs can take the necessary shapes, convolution is convolution and matrix multiplications can be cast to convolution, and losses can take different input and truth shapes. (The editing model parameters example shows how to make a model fully convolutional for inference.)

The contribution of this PR is to assemble pyramids and show how to do extraction. This seems like it could be done the most clearly and effectively as an example instead of through further code. For instance (1) the stitching approach could be turned into a data layer or (2) each level of the pyramid could be an input to a weight-shared pyramid net. At its simplest this could be a Python data layer that need not even be slow if one does multi-processing for prefetching.

As a pointer to dense models and matrix output follow #1698. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants