Description
This issue is just putting down some of my notes on why I think the current status is ok.
The question is whether to_vec(::Diagonal)
should return only the vector along the diagonal, or the whole densified matrix. I.e. should it do
julia> v, b = to_vec(Diagonal([1, 2])); v # struct-like
2-element Vector{Int64}:
1
2
or
julia> v, b = to_vec(Diagonal([1, 2])); v # array-like
4-element Vector{Int64}:
1
0
0
2
in other words, do we treat it as a struct or as an array? At present, we densify it, which feels wrong in the sense that the off-diagonal zeros are structural, i.e. they can't (conceptually) be perturbed. Practically, when we perturb the off-diagonal elements, we just get no effect and we waste computation.
On the other hand, it is practical. Consider test_rrule(Symmetric, randn(3, 3), :U)
, and we want to test whether the rule accepts Diagonal
tangents as well, which we do by test_rrule(Symmetric, randn(3, 3), :U; check_inferred=false, output_tangent=Diagonal(rand(3, 3)))
. If we do not densify the Diagonal, the test fails. The reason is that the finite differences jacobian is computed for the output type (Symmetric) vector (length 9), while the non-densified vector for a 3by3 Diagonal has length 3, which causes an awkward DimensionMismatch("second dimension of A, 9, does not match length of x, 3")
error, which is highly likely confusing to a newcomer.
So the tradeoff seems to be sacrificing convenience or wasting computational power. In the 21st century convenience is sacred and computation is cheap, so keeping Diagonal densified is ok with me.
Densifying used to cause different (and quite common) problems when testing rules in ChainRulesTestUtils, e.g. for d=Diagonal(rand(3)); test_rrule(*, d, d)
because the rand_tangent
would make a Tangent{Diagonal}(diag=[1., 2., 3.])
(vector of length 3), whereas the primal was a Diagonal
(vector of length 9). However these have been mostly fixed by adding the ProjectTo
into rand_tangent
in ChainRulesCore 1.0.