Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symmetric double counts #164

Open
oxinabox opened this issue May 27, 2021 · 6 comments
Open

Symmetric double counts #164

oxinabox opened this issue May 27, 2021 · 6 comments

Comments

@oxinabox
Copy link
Member

Consider: we know that the derviative of any sum of a collection of ones similar to the input.
because it is sum(xs) = x[1] + x[2] + ....

However for Symmetric it is giving 2 for the anti-diagonal on a 2x2 matrix.

using FiniteDifferences
using LinearAlgebra

grad(central_fdm(5, 1), sum, Symmetric(ones(2,2)))[1]
#== out:
2×2 Symmetric{Float64, Matrix{Float64}}:
 1.0  2.0
 2.0  1.0
 ==#


grad(central_fdm(5, 1), sum, ones(2,2))[1]
#==
2×2 Matrix{Float64}:
 1.0  1.0
 1.0  1.0
==#

Similar issues occur for prod which came up in https://github.com/JuliaDiff/ChainRules.jl/pull/335/files

Something must be wrong with out we are defining to_vec.

function to_vec(x::T) where {T<:LinearAlgebra.HermOrSym}
x_vec, back = to_vec(Matrix(x))
function HermOrSym_from_vec(x_vec)
return T(back(x_vec), x.uplo)
end
return x_vec, HermOrSym_from_vec
end

@willtebbutt
Copy link
Member

willtebbutt commented May 28, 2021

This feels to me like one of the peculiarities of using a Symmetric matrix to represent the (co)tangent of a Symmetric matrix. It's also where the semantics of to_vec and what we do with AD disagree, and is one of the things that we need to fix more generally.

It relates to #132 I believe.

If we pretend for a minute that the Symmetric matrix is a Tangent with a parent field, it would be correct. sum does indeed access an off-diagonal element of the input matrix on the forwards-pass twice, so a correct gradient for this operation would be

Tangent{Symmetric}(parent=[1 0; 2 1])

(or something like that).

I _ think_ the problem comes in when we convert between representations. In particular, we don't use a Tangent, we use another Symmetric, which interprets the parent field as a matrix, and therefore gives us the Symmetric matrix you're seeing. This is the essence of what to_vec is doing here, since to_vec doesn't know anything about the distinction between primals and tangents because it was devised before we really knew what we were doing there.

To convert between the strutural and natural cotangent for Symmetric, I think one needs to do some work. Specifically, halving the strict lower triangle of the parent field, and using it to construct a Symmetric. Similarly, to convert back you would need to double the strict lower triangle.

Does that make sense as an explanation for what is going on (even if it doesn't provide an obvious solution)?

edit: I suspect giving the parent field rather than collecting the matrix would resolve this particular problem, but other stuff from #132 then becomes relevant.

@devmotion
Copy link
Member

I wonder if it would just work if one uses the same approach as in #146 - similar to Diagonal, Symmetric is just a representation of a specific lower-dimensional manifold of the space of all matrices, so it seems a bit weird to go via Matrix (in both cases). Even if it would not fix the issue here, it would feel "more correct" to me.

@oxinabox
Copy link
Member Author

I wonder if it would just work if one uses the same approach as in #146 - similar to Diagonal, Symmetric is just a representation of a specific lower-dimensional manifold of the space of all matrices, so it seems a bit weird to go via Matrix (in both cases). Even if it would not fix the issue here, it would feel "more correct" to me.

Right, and that would be realtively simple, in that we would fix our Triangular matrixes to do that (right now they also go via Matrix) and then would use the appropriate one of them to get the vector.

@willtebbutt
Copy link
Member

willtebbutt commented May 28, 2021

I agree with both of you, but I would direct you towards #132 -- we'll be changing semantics by doing this.
I'm fine with making this change, but it is a pretty fundamental change / makes something that we already kind of do more explicit.

@devmotion
Copy link
Member

Although it is a fundamental change and probably breaks some things, I think one should change to_vec as suggested in #132.

I would expect to_vec(x) to return a canonical representation of the object x in R^n, and it seems reasonable to choose n as the intrinsic dimension of x (e.g. m in case of a m x m Diagonal matrix). I also think it's reasonable that different objects, such as (1, 3) and [1, 3], have the same canonical representation. The function back returnd from to_vec contains the recipe for reconstructing the object, and hence I think equality-wise one should only require x == y <=> |>(to_vec(x)...) == |>(to_vec(y)...).

This makes me wonder if there is a fundamental difference between ParameterHandling.flatten and to_vec, or if they could be unified in some way.

@willtebbutt
Copy link
Member

Glad we all agree -- doing this should simplify a number of implementations as well, which is nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants