Skip to content

Commit

Permalink
Finish off conclusion
Browse files Browse the repository at this point in the history
  • Loading branch information
Yani Ioannou committed Oct 28, 2016
1 parent e12cec8 commit 8132b98
Showing 1 changed file with 8 additions and 13 deletions.
21 changes: 8 additions & 13 deletions conclusion.tex
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,16 @@ \chapter{Conclusion}
%********************************** %First Section **************************************
In this work, we propose that carefully designing networks in consideration of our prior knowledge of the task can improve the memory and compute efficiency of state-of-the art networks, and even increase accuracy through structurally induced regularization. While this philosophy defines our approach, deep neural networks have a large number of degrees of freedom, and there are many facets of deep neural networks that warrant such analysis. We have attempted to present each of these in isolation:

Chapter \ref{conditionalnetworks} presented work towards conditional computation in deep neural networks, conditional networks, allowing for faster inference. We proposed a new discriminative learning model, \emph{conditional networks},
that jointly exploits the accurate \emph{representation learning} capabilities of deep neural networks with the efficient \emph{conditional computation} of decision trees and directed acyclic graphs (DAGs).
Chapter \ref{conditionalnetworks} presented work towards conditional computation in deep neural networks. We proposed a new discriminative learning model, \emph{conditional networks},
that jointly exploits the accurate \emph{representation learning} capabilities of deep neural networks with the efficient \emph{conditional computation} of decision trees and directed acyclic graphs (DAGs). In addition to allowing for faster inference, conditional networks yield smaller models, are highly interpretable, and offer test-time flexibility in the trade-off of compute \vs accuracy.

Conditional networks can be thought of as a way to learn an optimal block-diagonal sparsification of a DNN, and we show how they can be trained to cover the continuous spectrum between deep networks and decision forests/jungles.
In addition to improving test and training efficiency, conditional networks yield smaller models, are highly interpretable, and offer test-time flexibility. Validation was performed on standard image classification tasks. Compared to the state of the art, our results demonstrate superior efficiency for at-par accuracy both on the ImageNet and CIFAR datasets.


Chapters~\ref{lowrankfilters} and \ref{deeproots} addressed the spatial and channel (filter-wise) extents of convolutional filters respectively. Chapter~\ref{lowrankfilters} proposed to exploit our knowledge of the low-rank nature of most filters learned for image recognition by structuring a deep network to learn a low-rank basis for filters.

Rather than approximating filters in previously-trained networks with more efficient versions, we learn a set of small basis filters from scratch; during training, the network learns to combine these basis filters into more complex filters that are discriminative for image classification. This means that at both training and test time our models are more efficient.
Chapters~\ref{lowrankfilters} and \ref{deeproots} proposed similar methods for reducing the computation and number of parameters in the spatial and channel (filter-wise) extents of convolutional filters respectively. Rather than approximating filters in previously-trained networks with more efficient versions, we learn a set of small basis filters from scratch; during training, the network learns to combine these basis filters into more complex filters that are discriminative for image classification. This means that at both training and test time our models are more efficient.

We validate our approach by applying it to several existing CNN architectures and training these networks from scratch using the CIFAR, ILSVRC and MIT Places datasets. Our results show similar or higher accuracy than conventional CNNs requiring much less compute. Applying our method to an improved version of VGG-11 network using global max-pooling, we achieve comparable validation accuracy using 41\% less compute and only 24\% of the original VGG-11 model parameters; another variant of our method gives a 1 percentage point {\em increase} in accuracy over our improved VGG-11 model, giving a top-5 \emph{center-crop} validation accuracy of 89.7\% while reducing computation by 16\% relative to the original VGG-11 model. Applying our method to the GoogLeNet architecture for ILSVRC, we achieved comparable accuracy with 26\% less compute and 41\% fewer model parameters. Applying our method to a near state-of-the-art network for CIFAR, we achieved comparable accuracy with 46\% less compute and 55\% fewer parameters.
Chapter~\ref{lowrankfilters} proposed to exploit our knowledge of the low-rank nature of most filters learned for natural images by structuring a deep network to learn a collection of mostly small 1$\times$h and w$\times$1 basis filters, while only learning a few full w$\times$h filters. Our results showed similar or higher accuracy than conventional CNNs requiring much less compute. Applying our method to an improved version of VGG-11 network using global max-pooling, we achieve comparable validation accuracy using 41\% less compute and only 24\% of the original VGG-11 model parameters; another variant of our method gives a 1 percentage point {\em increase} in accuracy over our improved VGG-11 model, giving a top-5 \emph{center-crop} validation accuracy of 89.7\% while reducing computation by 16\% relative to the original VGG-11 model. Applying our method to the GoogLeNet architecture for ILSVRC, we achieved comparable accuracy with 26\% less compute and 41\% fewer model parameters. Applying our method to a near state-of-the-art network for CIFAR, we achieved comparable accuracy with 46\% less compute and 55\% fewer parameters.

Chapter \ref{deeproots} addresses the filter/channel extents of convolutional filters, by learning filters with limited channel extents. A new method for creating computationally efficient and compact convolutional neural networks (CNNs) using a novel sparse connection structure that resembles a tree root. This allows a significant reduction in computational cost and number of parameters of state-of-the-art deep CNNs without compromising accuracy.

We validate our approach by using it to train more efficient variants of state-of-the-art CNN architectures, evaluated on the CIFAR10 and ILSVRC datasets. Our results show similar or higher accuracy than the baseline architectures with much less compute, as measured by CPU and GPU timings. For example, for ResNet 50, our model has 40\% fewer parameters, 45\% fewer floating point operations, and is 31\% (12\%) faster on a CPU (GPU). For the deeper ResNet 200 our model has 25\% fewer floating point operations and 44\% fewer parameters, while maintaining state-of-the-art accuracy. For GoogLeNet, our model has 7\% fewer parameters and is 21\% (16\%) faster on a CPU (GPU).
Chapter \ref{deeproots} addresses the filter/channel extents of convolutional filters, by learning filters with limited channel extents. When followed by a 1$\times$1 convolution, these can also be interpreted as learning a set of basis filters, but in the channel extents.
Unlike in chapter \ref{lowrankfilters}, the size of these channel-wise basis filters increased with the depth of the model, giving a novel sparse connection structure that resembles a tree root. This allows a significant reduction in computational cost and number of parameters of state-of-the-art deep CNNs without compromising accuracy. Our results showed similar or higher accuracy than the baseline architectures with much less compute, as measured by CPU and GPU timings. For example, for ResNet 50, our model has 40\% fewer parameters, 45\% fewer floating point operations, and is 31\% (12\%) faster on a CPU (GPU). For the deeper ResNet 200 our model has 25\% fewer floating point operations and 44\% fewer parameters, while maintaining state-of-the-art accuracy. For GoogLeNet, our model has 7\% fewer parameters and is 21\% (16\%) faster on a CPU (GPU).

Overall, the approach of learning a set of basis filters was not only effective for reducing both computation and model complexity (parameters), but in many of the results in both chapter \ref{lowrankfilters} and \ref{deeproots}, the models trained with this approach generalized better than the original state-of-the-art models they were based on.

\end{document}

0 comments on commit 8132b98

Please sign in to comment.