Skip to content

Commit

Permalink
adding figure
Browse files Browse the repository at this point in the history
  • Loading branch information
karpathy committed Apr 11, 2015
1 parent 335d6ce commit 7dd3890
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 0 deletions.
Binary file added assets/nn3/gridsearchbad.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 7 additions & 0 deletions neural-networks-3.md
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,13 @@ But as saw, there are many more relatively less sensitive hyperparameters, for e

**Prefer random search to grid search**. As argued by Bergstra and Bengio in [Random Search for Hyper-Parameter Optimization](http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf), "randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid". As it turns out, this is also usually easier to implement.

<div class="fig figcenter fighighlight">
<img src="/assets/nn3/gridsearchbad.jpeg" width="50%">
<div class="figcaption">
Core illustration from <a href="http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf">Random Search for Hyper-Parameter Optimization</a> by Bergstra and Bengio. It is very often the case that some of the hyperparameters matter much more than others (e.g. top hyperparam vs. left one in this figure). Performing random search rather than grid search allows you to much more precisely discover good values for the important ones.
</div>
</div>

**Careful with best values on border**. Sometimes it can happen that you're searching for a hyperparameter (e.g. learning rate) in a bad range. For example, suppose we use `learning_rate = 10 ** uniform(-6, 1)`. Once we receive the results, it is important to double check that the final learning rate is not at the edge of this interval, or otherwise you may be missing more optimal hyperparameter setting beyond the interval.

**Stage your search from coarse to fine**. In practice, it can be helpful to first search in coarse ranges (e.g. 10 ** [-6, 1]), and then depending on where the best results are turning up, narrow the range. Also, it can be helpful to perform the initial coarse search while only training for 1 epoch or even less, because many hyperparameter settings can lead the model to not learn at all, or immediately explode with infinite cost. The second stage could then perform a narrower search with 5 epochs, and the last stage could perform a detailed search in the final range for many more epochs (for example).
Expand Down

0 comments on commit 7dd3890

Please sign in to comment.