Skip to content

Commit

Permalink
Update 14_resnet.ipynb (fastai#305)
Browse files Browse the repository at this point in the history
Fix a typo, replace "yjay"for "that"
  • Loading branch information
ricardocalleja authored Nov 29, 2020
1 parent fda223b commit fbf27ec
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion 14_resnet.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -411,7 +411,7 @@
"\n",
"Again, this is rather inaccessible prose—so let's try to restate it in plain English! If the outcome of a given layer is `x`, when using a ResNet block that returns `y = x+block(x)` we're not asking the block to predict `y`, we are asking it to predict the difference between `y` and `x`. So the job of those blocks isn't to predict certain features, but to minimize the error between `x` and the desired `y`. A ResNet is, therefore, good at learning about slight differences between doing nothing and passing though a block of two convolutional layers (with trainable weights). This is how these models got their name: they're predicting residuals (reminder: \"residual\" is prediction minus target).\n",
"\n",
"One key concept that both of these two ways of thinking about ResNets share is the idea of ease of learning. This is an important theme. Recall the universal approximation theorem, which states that a sufficiently large network can learn anything. This is still true, but there turns out to be a very important difference between what a network *can learn* in principle, and what it is *easy for it to learn* with realistic data and training regimes. Many of the advances in neural networks over the last decade have been like the ResNet block: the result of realizing how to make something yjay was always possible actually feasible.\n",
"One key concept that both of these two ways of thinking about ResNets share is the idea of ease of learning. This is an important theme. Recall the universal approximation theorem, which states that a sufficiently large network can learn anything. This is still true, but there turns out to be a very important difference between what a network *can learn* in principle, and what it is *easy for it to learn* with realistic data and training regimes. Many of the advances in neural networks over the last decade have been like the ResNet block: the result of realizing how to make something that was always possible actually feasible.\n",
"\n",
"> note: True Identity Path: The original paper didn't actually do the trick of using zero for the initial value of `gamma` in the last batchnorm layer of each block; that came a couple of years later. So, the original version of ResNet didn't quite begin training with a truly identity path through the ResNet blocks, but nonetheless having the ability to \"navigate through\" the skip connections did indeed make it train better. Adding the batchnorm `gamma` init trick made the models train at even higher learning rates.\n",
"\n",
Expand Down

0 comments on commit fbf27ec

Please sign in to comment.