Description
Right now layers like BatchNorm
and Dropout
have a flag to put them in train
or test
mode. However, once Zygote lands (#628) we can do something much clevererer: we enable the regularisation only in a gradient context. Then it will automatically be on during the training loop and off at test time.
We can of course still have a manual override here (just set the enabled
flag to :auto
by default), but it's interesting to consider whether we even need this; I suspect we will but don't know of any explicit use cases for it.
Currently I think this aligns well with how I've seen people use these layers in practice, avoids some predictable mode-boilerplate, and gets rid of one more usage of mapleaves
. However, it does make a fairly strong assumption about how these layers get used, so I'm on the lookout for cases where this might lead to counter-intuitive or unexpected behaviour, compared to the explicit approach.