Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jvp incorrect result rather than erroring #90

Open
MasonProtter opened this issue Jun 24, 2020 · 10 comments
Open

jvp incorrect result rather than erroring #90

MasonProtter opened this issue Jun 24, 2020 · 10 comments
Labels
bug Something isn't working

Comments

@MasonProtter
Copy link

There appears to be a problem with jvp when the v supplied is real and the primal is complex:

using FiniteDifferences
jvp(central_fdm(5,1), abs2, (3.0 + im, 0.25))

#+RESULTS:
: 1.9999999999995874
jvp(central_fdm(5,1), abs2, (3.0 + im, 0.25 + 0im))

#+RESULTS:
: 1.4999999999999445

The correct, answer here is 1.5. This should produce an error.

@ettersi
Copy link

ettersi commented Jun 26, 2020

Continuing the discussion from JuliaDiff/ChainRules.jl#196 (comment):

Whether or not it plays nicely with Unitfuls out of the box is an implementation detail.

Well it's part of the interface, so not really an implementation detail.

The current design requires that any type can be mapped to a plain Vector{<:Real}, because we can always add them and scale them.

Isn't converting Complex to Vector{Real} a real drag on performance?

The alternative approach would be to assume that any tangent vector can be scaled and added to the corresponding vector.

I don't understand. Why do we need to add tangent vectors to vectors / primals? The definition

lim_{h -> 0} ( f(x + Δx*h) - f(x) ) / h

requires only scaling and addition in the domain and range. Furthermore, I don't see how we could reasonably define derivatives without these properties.

Long story short, I would prefer seeing this issue fixed by making

jvp(central_fdm(5,1), abs2, (3.0 + im, 0.25))

return 1.5 rather than throwing an error. That would make the interface consistent with JuliaDiff/ChainRules.jl#196.

Of course, I'm just a random guy on the internet who has not contributed any code to this package yet, so please apologise if my opinion is unsolicited.

@willtebbutt
Copy link
Member

Of course, I'm just a random guy on the internet who has not contributed any code to this package yet, so please apologise if my opinion is unsolicited.

Your opinion is very much valued. Please keep pointing out where you think that things could be improved :)

requires only scaling and addition in the domain and range. Furthermore, I don't see how we could reasonably define derivatives without these properties.

I agree. The way that to_vec implements this is by constructing an isomorphism between vectors / tangents and Vectors. FiniteDifferences then operates directly on Vectors, and applies the inverse mapping at the end to get back to the original types.

I would love to see FiniteDifferences do the correct thing here -- that we have the to_vec function is something of a legacy feature that could probably do with being updated in light of the types in ChainRules. As you point out, all that we need to be able to do to compute jvps / pushforwards is

  1. add tangents to primals
  2. scale tangents by scalars
  3. subtract one primal from another, divide through by a scalar, and spit out a tangent.

If we use ChainRules types to represent said tangents, the first two items are free. The third would require some work though.

Pullbacks / gradients / Jacobians are a bit slightly trickier, and we don't currently have the code to implement this properly. To do this we would need some mechanism for constructing the standard basis for any type (inc. structs etc), which requires some thought but is definitely doable. We would also need a way to apply cotangents to tangents -- this is arguably a missing feature of ChainRules that we could either resolve by introducing a collection of cotangent types / wrapper types, or by defining the inner product between tangents and failing to distinguish between tangents and cotangents. There are probably some other details that I'm brushing over.

Either way I would like to see this resolved because more and more I'm finding that to_vec is becoming burdensome and limiting the kinds of things that we can easily test with FiniteDifferences e.g. frules and rrules of higher-order functions.

@ettersi
Copy link

ettersi commented Jun 26, 2020

If we use ChainRules types to represent said tangents, the first two items are free. The third would require some work though.

Can you provide an example where the third item is indeed tricky to implement? As far as I can tell, finite differences only makes sense if both the domain and range are affine spaces, and in this case there is no problem with Item 3.

@willtebbutt
Copy link
Member

Can you provide an example where the third item is indeed tricky to implement? As far as I can tell, finite differences only makes sense if both the domain and range are affine spaces, and in this case there is no problem with Item 3.

No conceptual difficulty, just tricky in the sense that you would need to implement stuff to actually do this. Not all of the types that we would be working with placing nicely with scaling / subtraction (e.g. arbitrary structs) so we would need to roll our own functionality to do that. Were we to do this I imagine we would implement a function estimate_tangent(f, h, x, dx) that rolls all of these things into a single operation, so we can explicitly provide methods for this for individual types that don't naturally support the operations that we need and we're unable to define them for due to type-piracy restrictions.

@ettersi
Copy link

ettersi commented Jun 26, 2020

Not all of the types that we would be working with placing nicely with scaling / subtraction (e.g. arbitrary structs) so we would need to roll our own functionality to do that.

Or FiniteDifferences could just refuse to work with such types on the grounds that if they do not implement the interface of an affine vector space, then there is no sensible definition of finite differences.

@ettersi
Copy link

ettersi commented Jun 26, 2020

Pullbacks / gradients / Jacobians are a bit slightly trickier, and we don't currently have the code to implement this properly. To do this we would need some mechanism for constructing the standard basis for any type (inc. structs etc).

One way to resolve this would be to 1) fix a generic interface for dual vectors / cotangents and 2) require domain types to implement their own methods for vjp.

Examples for 1), where v denotes a vector and f a dual vector:

  • f(v) (this emphasises the linear functional aspect of dual vectors)
  • f*v (this emulates the f'*v notation from R^n)
  • dot(f,v) (this emulates the notation for inner product spaces)

Out of these three, I would choose dot(f,v) since it is already defined and does the right thing for most types.

Examples for 2), assuming the dot(f,v) notation for applying duals to vectors:

vjp(fdm,f,ȳ,x::Real) = dot(ȳ,jvp(fdm,f,x,1))
vjp(fdm,f,ȳ,x::Complex) = dot(ȳ,jvp(fdm,f,x,1)) + im*dot(ȳ,jvp(fdm,f,x,im))
vjp(fdm,f,ȳ,x::Vector{<:Real}) = [dot(ȳ,jvp(fdm,f,unit(x,i))) for i = 1:length(x)]
unit(x,i) = (e = zeros(length(x)); e[i] = 1; e)

@willtebbutt
Copy link
Member

willtebbutt commented Jun 26, 2020

Or FiniteDifferences could just refuse to work with such types on the grounds that if they do not implement the interface of an affine vector space, then there is no sensible definition of finite differences.

I don't think that this would be a good solution. For example, it would preclude the use of Tuples.

Out of these three, I would choose dot(f,v) since it is already defined and does the right thing for most types.

I agree.

Examples for 2), assuming the dot(f,v) notation for applying duals to vectors:

I like this proposal a lot.

This would just leave the question of how to represent Jacobians. Perhaps a collection of tangents, one for each input, would make most sense? And perhaps this could be specialised to a matrix in the Vector case? To be honest I'm really not sure whether people are actually using this package for that kind of functionality.

I do wonder whether we really need to be able to compute vjps / gradients at all though. If one accepts that the sole purpose of FiniteDifferences is testing AD, then you really just need the ability to compute jvps (to test forwards-mode) and to verify that

_, pullback = rrule(f, x)
dot(ȳ, jvp(fdm, f, x, ẋ))  dot(pullback(ȳ), ẋ)

to test reverse-mode. Perhaps this is a step too far though. It's entirely possible that people are using this package to compute gradients, so losing that functionality would make for a bad user experience.

@sethaxen
Copy link
Member

sethaxen commented Jun 26, 2020

It's entirely possible that people are using this package to compute gradients, so losing that functionality would make for a bad user experience.

Manifolds.jl's default "AD" backend is FiniteDifferences, which it uses in internal functions to compute the gradient. It's placeholder functionality though and currently not used for any higher order functions. 🤷‍♂️

@ettersi
Copy link

ettersi commented Jun 28, 2020

Or FiniteDifferences could just refuse to work with such types on the grounds that if they do not implement the interface of an affine vector space, then there is no sensible definition of finite differences.

I don't think that this would be a good solution. For example, it would preclude the use of Tuples.

Tuples could be fixed either by wrapping them in an SVector-like struct or by implementing the vector arithmetic using .+ and .* instead of + and *. The latter would be more efficient even for AbstractArrays.

@willtebbutt
Copy link
Member

Certainly wrapper types would work as alternative to implementing a special function.

.+ and .* etc would still be type piracy though I believe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants