Open
Description
Suppose we have a function f: ℝᵐ → ℝⁿ
that for some reason we want to differentiate in forward mode, which will require calling all frule
s m
times. This seems wasteful, as the pushforwards often depend on intermediates of the primal function that don't change. In the current implementation of frule
s, where the output of the pushforward is computed at the same time as the output of the primal, these intermediates would need to be recomputed m
times. An example is symmetric eigendecomposition, where the eigendecomposition really only needs to be computed once but will instead be computed m
times.
I'm sure there are good reasons for implementing this way. One I can think of is that it's easier to support mutating rules. Are there others?