-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSRA weight filler #1946
MSRA weight filler #1946
Conversation
… use with ReLUs instead of tanh. Based on paper: He et al, "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification," 2015. Added VarianceNorm option to FillerParameters which allows one to normalize by fan_in, fan_out or their average. Updated XavierFiller to use the VarianceNorm option (default behavior unchanged). Added tests for MSRAFiller and XavierFiller.
Note that #1970 should fix the |
* scale] where scale = sqrt(3 / n) where n is the fan_in, fan_out, or their | ||
* average, depending on the variance_norm option. You should make sure the | ||
* input blob has shape (num, a, b, c) where a * b * c = fan_in and num * b * c | ||
* = fan_out. Note that this is currently not the case for inner product layers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1970 is in so this filler is now right for InnerProduct layers too.
@nickcarlevaris thanks -- this looks good. The only potential issue is naming and attribution. I am not certain but if I understand correctly the same @nickcarlevaris you suggested "ReLU" since this is intended for use with the so-named nonlinearity. It could be this is the right choice. @longjon ? |
#1940 has been merged for a month. Can these two work together to reproduce the paper's results? |
This issue has been open for a long time. Hope it merged quickly. |
Why hasn't this been merged into master? anything wrong? |
Add MSRAFiller, an Xavier-like filler designed for use with ReLUs
Merged to master in c255709. Thanks @nickcarlevaris! I did a manual merge to re-format the commit message and add my own commit to note potentially related work. Closing since my edit threw off the github merge. |
Why there is no parameter to specify the \alpha defined in Equation 15? |
This PR adds MSRAFiller, which implements an Xavier-like filler designed for use with ReLUs instead of tanh, based on the paper: He et al, "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification," 2015.
It also adds a VarianceNorm option to FillerParameters which allows one to normalize by fan_in, fan_out or their average. VarianceNorm applies to the MSRAFiller and the XavierFiller (default behavior unchanged). It also adds tests for MSRAFiller and XavierFiller.
Replaces #1883 (updates based on that discussion and rebased against master).
Like the XavierFiller, the fan_in and fan_out dimensions are not correct for inner product layers (as pointed out by @seanbell in #1883). However, I did update the documentation to note this.