-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Train/test mode #643
Comments
I think I agree with this! The only case I've ever encountered where the layer mode diverges from the autodiff context is that batch norm can sometimes operate in a third mode where the running mean and variance are used for normalization and also updated based on changing statistics of the data; this is sometimes used for online inference. I think that's sufficiently niche to ignore 🙂 |
669: using Zygote r=MikeInnes a=MikeInnes Otherwise known as "break all the things". This will be a huge change so I'm beginning to prepare now, even though Zygote is still a couple of months off from being really ready. **Do not try this at home** (yet) – this branch is eventually aimed at beta testers, but isn't even ready for that yet. The idea is to break as little code as possible, which means supporting the current `Params` API; but I also want to start prototyping the nicer things discussed in #628 and other issues. Blocking issues: * [x] Get the tests passing. * [x] Check tests on GPU. * [x] Rewrite all the docs. * [x] Cache invalidation (JuliaLabs/Cassette.jl#6). * [x] Moving over adjoints (FluxML/Zygote.jl#81). * [x] General Zygote robustness. Nice to have: * [ ] Robust nested AD (may not be a blocker if one can still use Tracker with Flux). * [x] Zygote support for modules / globals as discussed in #628, along with #637. * [x] Better train/test mode as in #643. If you're the kind of person who ignores triangular road signs, you can try this with ```julia ]add Flux#zygote Zygote#master ``` Co-authored-by: Mike J Innes <mike.j.innes@gmail.com> Co-authored-by: Elliot Saba <staticfloat@gmail.com> Co-authored-by: thebhatman <manjunathbhat9920@gmail.com>
this has already landed |
After reading this thread, it is not clear to me whether or not I need to use |
Right now layers like
BatchNorm
andDropout
have a flag to put them intrain
ortest
mode. However, once Zygote lands (#628) we can do something much clevererer: we enable the regularisation only in a gradient context. Then it will automatically be on during the training loop and off at test time.We can of course still have a manual override here (just set the
enabled
flag to:auto
by default), but it's interesting to consider whether we even need this; I suspect we will but don't know of any explicit use cases for it.Currently I think this aligns well with how I've seen people use these layers in practice, avoids some predictable mode-boilerplate, and gets rid of one more usage of
mapleaves
. However, it does make a fairly strong assumption about how these layers get used, so I'm on the lookout for cases where this might lead to counter-intuitive or unexpected behaviour, compared to the explicit approach.The text was updated successfully, but these errors were encountered: