Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove adjoint for fill and fix tests #203

Merged
merged 23 commits into from
Nov 8, 2021
Merged

Remove adjoint for fill and fix tests #203

merged 23 commits into from
Nov 8, 2021

Conversation

devmotion
Copy link
Member

@devmotion devmotion commented Oct 21, 2021

This PR removes the Zygote adjoint for fill (see #202 for a discussion for why it is bad and causes problems in other packages).

Additionally, it


Original comment:

An alternative to #202 with some additional unrelated fixes that make Zygote tests pass locally. Hopefully this fixes also the Tracker and ReverseDiff issues (didn't run them locally to save time).

Edit: Unfortunately, it seems the Tracker and ReverseDiff issues are not fixed by this PR.

Edit 2: Maybe the problem is that RD, Tracker, and FD don't know the derivative of xlogy and xlog1py. I'll check locally if adding some DiffRules helps.

Edit 3: Indeed, the issues with Tracker, ReverseDiff, and ForwardDiff are fixed by defining diff rules for xlogy and xlog1py. I made a PR to DiffRules (JuliaDiff/DiffRules.jl#69) but unfortunately additionally it requires fixes for Tracker, ReverseDiff and ForwardDiff itself. In the mean time I guess the easiest is to choose different parameter values such that we don't end up with the special case for xlogy where x is zero.

test/ad/distributions.jl Outdated Show resolved Hide resolved
test/ad/distributions.jl Outdated Show resolved Hide resolved
test/ad/distributions.jl Outdated Show resolved Hide resolved
* Fix deprecations

* Improve CI (AD): cancel builds and no coverage

* Improve CI (Others): cancel builds and no coverage
@devmotion
Copy link
Member Author

It seems there are two problems remaining with Zygote that are both caused by the ChainRules integration:

In both cases it seems somehow ChainRules types (NoTangent and Tangent) make it into the Zygote internals which causes the problems in the end. E.g., changing ZeroTangent() in

(c, -c, ZeroTangent()),
to 0 seems to fix the first issue but this seems rather like a workaround than a proper fix. And the second problem occurs more generally.

@mcabbott @DhairyaLGandhi I lost track of the recent changes, open PRs and possible fixes in Zygote for the ChainRules integration - is there anything on the Zygote side that could fix the problems? I assume you might be familiar with these issues. I'll also check if removing the workarounds in #198 helps (if they are not needed anymore, that's the first thing I have to check of course).

@mcabbott
Copy link

As far as I know this one is a new class of bug. Having an array of nothing would be no better, but perhaps Zygote used to catch that, and fails to catch Array{NoTangent}? It's in the gradient for getindex I think:

MethodError: Cannot `convert` an object of type Bool to an object of type NoTangent
   [1] fill!(::Array{NoTangent,1}, ::Bool) at ./array.jl:309
   [2] _zero(::Array{Float64,1}, ::Type) at /home/runner/.julia/packages/Zygote/rv6db/src/lib/array.jl:61
   [3] (::Zygote.var"#426#428"{1,Float64,Array{Float64,1},Tuple{UnitRange{Int64}}})(::Array{NoTangent,1}) at /home/runner/.julia/packages/Zygote/rv6db/src/lib/array.jl:37

Whereas this one looks like a Tangent is leaking to where it expects a NamedTuple:

Need an adjoint for constructor TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}}. Gradient is of type Tangent{Any,NamedTuple{(:m, :σ),Tuple{Array{Float64,1},Array{Float64,1}}}}
   [2] (::Zygote.Jnew{TuringDiagMvNormal{Array{Float64,1},Array{Float64,1}},Nothing,false})(::Tangent{Any,NamedTuple{(:m, :σ),Tuple{Array{Float64,1},Array{Float64,1}}}}) at /home/runner/.julia/packages/Zygote/rv6db/src/lib/lib.jl:323

Some such leaks are plugged in FluxML/Zygote.jl#1104 . That's the only open PR which fiddles with these things, I believe.

Linked case is Zygote v0.6.29, julia version 1.3.1. Am I reading correctly that these errors do not occur on Julia 1.6?

@devmotion
Copy link
Member Author

Am I reading correctly that these errors do not occur on Julia 1.6?

No, the tests were just cancelled before these tests were run with Julia 1.6 on Github. I can reproduce the test errors locally with Julia 1.6.

@mcabbott
Copy link

mcabbott commented Oct 22, 2021

OK. Maybe FluxML/Zygote.jl@35ee9bd will run these tests on top of that PR. Not sure I understand how group/AD selects what's used in tests; ideally it would run just the Zygote ones. In https://github.com/FluxML/Zygote.jl/runs/3976090808?check_suite_focus=true so far I see Tracker, ForwardDiff, ReverseDiff so perhaps it's doing them all.

@devmotion
Copy link
Member Author

ideally it would run just the Zygote ones.

I already changed this in this PR, you can run the Zygote tests only with group: Zygote (but it does not work with the master branch).

@mcabbott
Copy link

The complete test was killed after 6 hours. I see many failures, but is is clear to you whether or not it got past the failures above?

@devmotion
Copy link
Member Author

No, it didn't.

@devmotion
Copy link
Member Author

devmotion commented Oct 24, 2021

* evaluation of the loglikelihood of multiple samples from a product distribution of a matrix of uniform distributions errors:  https://github.com/TuringLang/DistributionsAD.jl/runs/3970857262?check_suite_focus=true#step:6:820 and similar lines below

@mcabbott I was able to fix these issues locally by simplifying the Zygote tests a bit (in contrast to the tracked/dual number type ADs we only have to check the full set of arguments once) and using test_rrule which takes care of vectorization for finite differencing etc. for us instead of the custom vectorization in the test suite. My plan is to change everything to CRTestutils and test_rrule and test_frule, and I already have a draft for ForwardDiff and Tracker as well, but originally my plan was to do it in a separate PR since this PR is quite messy already. However, it seems it might be the easiest way to fix one of the two remaining Zygote test issues.

However, I had to add the type piracy in

# Workaround for nested `nothing`
Zygote.z2d(::NTuple{<:Any,Nothing}, ::Tuple) = NoTangent()
function Zygote.z2d(t::NamedTuple, primal::T) where T
fnames = fieldnames(T)
complete_t = map(n -> get(t, n, nothing), fnames)
primals = map(n -> getfield(primal, n), fnames)
tp = map(Zygote.z2d, complete_t, primals)
return if tp isa NTuple{<:Any,NoTangent}
NoTangent()
else
canonicalize(Tangent{T, typeof(tp)}(tp))
end
end
(should be moved to Zygote) since otherwise some of the filldist tests failed (hopefully the last commit doesn't break other tests 😄). More concretely, the problem was that https://github.com/JuliaDiff/ChainRulesTestUtils.jl/blob/dd0e246d1516451ec963db55a7a51910819082b3/src/testers.jl#L228 failed otherwise since cotangents that were specified as NoTangent and for which FiniteDifferences returned a NoTangent were computed as nested Tangent of NoTangents and tuples of NoTangents by Zygote. Unfortunately, I don't have a MWE but only a quite complicated test here.

@mcabbott
Copy link

Well spotted. Yes I think that ought to be safe. I've added FluxML/Zygote.jl@45f5c27 which I hope captures this effect everywhere. We'll see if CI likes it...

test/ad/utils.jl Outdated Show resolved Hide resolved
@devmotion
Copy link
Member Author

I confirmed locally that the remaining Zygote test errors (namely, as mentioned above,

evaluation of the loglikelihood of single and multiple samples from a product distribution of a vector of multivariate distributions errors

) are indeed fixed by FluxML/Zygote.jl#1104. FluxML/Zygote.jl#1104 also allows us to remove the workarounds for Zygote.z2d that are currently added to the tests.

@mcabbott Thus this PR is blocked by FluxML/Zygote.jl#1104 but tests should pass once a new Zygote version with the changes in this PR is released.

test/ad/utils.jl Outdated Show resolved Hide resolved
Copy link
Member

@yebai yebai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me - many thanks @devmotion !

@devmotion devmotion merged commit 47214f8 into master Nov 8, 2021
@devmotion devmotion deleted the dw/fixes branch November 8, 2021 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants