Common use cases: * running statistics, for example for batch norm (#89) * weight decay (executed before grad update) * constraints, renormalization of vars (executed after grad update)