Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train/test mode #643

Closed
MikeInnes opened this issue Feb 26, 2019 · 3 comments
Closed

Train/test mode #643

MikeInnes opened this issue Feb 26, 2019 · 3 comments

Comments

@MikeInnes
Copy link
Member

Right now layers like BatchNorm and Dropout have a flag to put them in train or test mode. However, once Zygote lands (#628) we can do something much clevererer: we enable the regularisation only in a gradient context. Then it will automatically be on during the training loop and off at test time.

We can of course still have a manual override here (just set the enabled flag to :auto by default), but it's interesting to consider whether we even need this; I suspect we will but don't know of any explicit use cases for it.

Currently I think this aligns well with how I've seen people use these layers in practice, avoids some predictable mode-boilerplate, and gets rid of one more usage of mapleaves. However, it does make a fairly strong assumption about how these layers get used, so I'm on the lookout for cases where this might lead to counter-intuitive or unexpected behaviour, compared to the explicit approach.

@jekbradbury
Copy link
Contributor

I think I agree with this! The only case I've ever encountered where the layer mode diverges from the autodiff context is that batch norm can sometimes operate in a third mode where the running mean and variance are used for normalization and also updated based on changing statistics of the data; this is sometimes used for online inference. I think that's sufficiently niche to ignore 🙂

@MikeInnes MikeInnes mentioned this issue Mar 8, 2019
9 tasks
MikeInnes added a commit that referenced this issue Mar 8, 2019
MikeInnes added a commit that referenced this issue Mar 8, 2019
staticfloat pushed a commit that referenced this issue May 3, 2019
bors bot added a commit that referenced this issue Sep 11, 2019
669: using Zygote r=MikeInnes a=MikeInnes

Otherwise known as "break all the things". This will be a huge change so I'm beginning to prepare now, even though Zygote is still a couple of months off from being really ready. **Do not try this at home** (yet) – this branch is eventually aimed at beta testers, but isn't even ready for that yet.

The idea is to break as little code as possible, which means supporting the current `Params` API; but I also want to start prototyping the nicer things discussed in #628 and other issues.

Blocking issues:

* [x] Get the tests passing.
* [x] Check tests on GPU.
* [x] Rewrite all the docs.
* [x] Cache invalidation (JuliaLabs/Cassette.jl#6).
* [x] Moving over adjoints (FluxML/Zygote.jl#81).
* [x] General Zygote robustness.

Nice to have:

* [ ] Robust nested AD (may not be a blocker if one can still use Tracker with Flux).
* [x] Zygote support for modules / globals as discussed in #628, along with #637.
* [x] Better train/test mode as in #643.

If you're the kind of person who ignores triangular road signs, you can try this with

```julia
]add Flux#zygote Zygote#master
```

Co-authored-by: Mike J Innes <mike.j.innes@gmail.com>
Co-authored-by: Elliot Saba <staticfloat@gmail.com>
Co-authored-by: thebhatman <manjunathbhat9920@gmail.com>
BerenMillidge pushed a commit to BerenMillidge/Flux.jl that referenced this issue Dec 20, 2019
@CarloLucibello
Copy link
Member

this has already landed

@wsshin
Copy link

wsshin commented Dec 13, 2022

After reading this thread, it is not clear to me whether or not I need to use trainmode!() and testmode!(). Can someone clarify?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants