-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: A General Recipe for Generic Rules and Natural Tangents (hopefully...) #449
base: main
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #449 +/- ##
===========================================
- Coverage 92.85% 12.38% -80.48%
===========================================
Files 14 15 +1
Lines 784 945 +161
===========================================
- Hits 728 117 -611
- Misses 56 828 +772
Continue to review full report at Codecov.
|
Tidied up |
Added |
Added a |
Turns out that the above problem was related to me not specifying that the rule in question was restricted to a I've also added some examples with |
Adding extra tests for For under-parametrised matrices I'm fairly confident that we can think of the Jacobian of restructure as a left-inverse of the Jacobian of destructure (left-inverse should always exist I believe). This doesn't work for over-parametrised types because the Jacobian of destructure maps from a larger number of dims to a smaller number of dims, so doesn't admit a left-inverse (to see this, think about the rank of a matrix product of the form Not sure if there's a way to fix this. Doesn't seem to be an issue when either
Not sure what the geometric explanation of these last two points is. Presumably there's a nice one. edit: possibly we can understand what's going on here by considering the pseudo-inverse of the Jacobian of destructure. Assume that the Jacobian of restructure, Conversely, if f(x::ScaledMatrix) = my_sum(my_scale(x)) where both edit2: There are a couple of ways that we could utilise this information for guiding rule-implementers. My inclination is just to suggest that type-authors avoid implementing edit3: a consequence of the left-inverse Jacoabian stuff (for under-parametrised matrices), is that the pullback of restructure is simply a right-inverse for the pullback of destructure, and the pushforward of restructure is a left-inverse for the pushforward of the pushforward of destructure. i.e pushforward_restructure(pushforward_destructure(x)) == x
pullback_destructure(pullback_restructure(x)) == x This gives a really nice way to derive what you need for restructure without thinking about how to actually implement edit4: I've managed to show that, if you want a sensible definition of the natural pullback for the identity function, the pullback of restructure must be a right-inverse of the pullback of destructure. However, I've not managed to determine whether any right inverse will do. Would like to know if that's the case. |
I continue to want to submit a workshop paper about this, to get feedback. In languages featuring polymorphism the the representation of the derivative of types that represent matrixes poses a challange. We can very simply define the idea of a AbstractMatrix subtype as something implementing the following interface:
From this simple interface, all other expected functionality The Background: Polymorphic AbstractMatrix subtypeThese struct-based AbstractMatrix types are very useful.
They can be used to implement matrixes with structural sparisty allowing for algorithms with improved time complexity.
For example for
Many other examples exist. Background: When do we want each tangent typeStructual tangents have two key uses, in addition to generally being a useful representation. The pullback of the The constructor's pullback is naturally implemented in terms of the structural tangent. However, a key limitation of the structural tangent is that it does not implement
All these examples require a co-tangent type that is natural and supports the operations an array does. |
|
||
`destructure` is quite straightforward to define -- essentially equivalent to `collect`. I'm confident that this is always going to be simple to define, because `collect` is always easy to define. | ||
|
||
`Restructure(C)(C_dense)` is a bit trickier. It's the function which takes an `Array` `C_dense` and transforms it into `C`. This feels like a slightly odd thing to do, since we already have `C`, but it's necessary to already know what `C` is in order to construct this function in general -- for example, over-parametrised matrices require this (see the `ScaledMatrix` example in the tests / examples). I'm _reasonably_ confident that this is always going to be possible to define, but I might have missed something. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how to do this for e.g. Woodbury, but I think you mentioned you thought about this and know how to do it?
Similarly, two different structured matrices might map to the same dense matrix. In that case I suppose it doesn't matter which structured matrix we get for the correctness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how to do this for e.g. Woodbury, but I think you mentioned you thought about this and know how to do it?
The more I think about it, the less convinced I am that I know how to do this. I think this section on Restructure
needs re-working / updating to reflect the increased level of uncertainty I have about how easy this is to do. For example, the ScaledMatrix
example I mention is less straightforward than I originally thought.
Similarly, two different structured matrices might map to the same dense matrix. In that case I suppose it doesn't matter which structured matrix we get for the correctness?
Yeah, I think that's totally fine. But is this a destructure
thing rather than Restructure
?
This business of how to map between "natural" and "structural" representations was bugging me, but I now think it's actually very simple: Think of the forward pass as maps The pullback of only the second map also always exists; it's a linear map from The apparent ambiguity described here in how to pick the natural for In general In this PR's description, I remain a bit confused by what the map |
Exactly. Roughly speaking, I had been thinking about this as
I think it's basically fine to think about I'm struggling a little to see exactly what you mean by the intermediate step though. In particular, how are you thinking about
It acts on the manifold. Honestly though, I'm really not sure that it always exists, and I'm still not 100% sure what its semantics are. It's probably easiest to ignore this for now, because it's a bit of a pain and I'm not really even sure how to derive this function in general (it could be something ugly and nonlinear, and I'm not entirely confident that it's uniquely defined). I'm confident that what's left is unambiguous, even if we're potentially missing a bit of functionality, so let's stick with that. We can revisit restructuring later if we like. |
(Wrote this before I saw your doc on slack -- will try to read today) |
I guess we never actually want to produce this, for converting natural to structural. We want just For (a) I wonder a bit if we want a call-site opt-out mechanism. Like For (b), I guess we need in-place accumulation, either by something like FluxML/Zygote.jl#981 or by And, obstacle (c), it's difficult to see this working well at all for CuArrays. I guess that's like most generic fallbacks. A hand-written |
This PR is a proposal I've been working on to address the issues discussed (at length) in #441 .
To understand the PR, please consult
notes.md
first, andexamples.jl
once you've done that.In particular, this proposal aims to achieve a clean separation between the world of structural tangents, which work well with AD, and natural tangents, which humans often prefer -- the advantages of such a set up are laid out in
notes.md
.This is not as much of a beast as it initially seems, as most of the LOC are contained in
notes.md
andexamples.jl
, neither of which would ever be merged. Also, much of the code presently insrc/destructure.jl
will be moved to the tests before merging.