-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Glorot initialization, add He initialization #937
Conversation
Should fix the issue reported at FluxML#442 . Adds He weight initialization as a bonus :-)
Should fix FluxML#442 . Adds He weight initialization as a bonus :-)
Best to split this into two PRs to make it easier to review and merge. |
I couldn't figure out how to split the PR into two, so I deleted the He initialization stuff (for now). (And sorry for the mess. I'm not yet completely familiar/comfortable with how Git(hub) works...) |
No worries. Splitting the commit is possible but it's usually easier just to delete code and/or copy-paste it somewhere else for the |
Anything else I should do to get this PR reviewed / merged? |
I'm a bit uncomfortable with the heuristic this uses, but at least it has good precedent in other libraries. We can always redesign in future if needed. Thanks @Sleort! bors r+ |
Build succeeded |
I agree (see the discussion at #442). A more "proper" way of treating this would, however, require breaking changes, since we would have to dispatch on layer type in addition to dimensionality. Actually, ideally we should maybe make the initialization depend on the activation function as well, such that the "right" initialization could be automatically selected (Glorot for |
We don't generally consider changing a layer's default initialisation to be breaking, so we could make the defaults smarter at any time, even if it isn't exposed as an API. A more general user-accessible initialisation API would be nice to have though. |
Should fix #442 .
Adds He weight initialization as a bonus :-)