Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster generic broadcasting #1001

Merged
merged 18 commits into from
Jun 25, 2021
Merged

Faster generic broadcasting #1001

merged 18 commits into from
Jun 25, 2021

Conversation

mcabbott
Copy link
Member

@mcabbott mcabbott commented Jun 18, 2021

During JuliaDiff/ChainRules.jl#441 we realised there's probably a factor 10 left on the table, for no good reason. Not always a factor 10, but still:

julia> xs = randn(10_000);

julia> @btime Zygote.gradient(x -> sum(abs.(x)), $xs);
  478.458 μs (30048 allocations: 1.07 MiB)  # Zygote v0.6.12
  28.625 μs (28 allocations: 469.50 KiB)    # this PR, reverse mode
  19.375 μs (52 allocations: 314.17 KiB)    # this PR, forward mode

julia> @btime Zygote.gradient(x -> sum(cbrt.(x)), $xs);
  545.125 μs (30048 allocations: 938.72 KiB)
  81.500 μs (28 allocations: 391.38 KiB)    # this PR, reverse
  87.417 μs (52 allocations: 314.17 KiB)    # this PR, forward

julia> @btime copy($xs);  # to compare time & memory
  2.644 μs (2 allocations: 78.20 KiB)

julia> @btime Zygote.gradient(x -> sum(tanh.(x)), $xs);  # has its own gradient
  114.250 μs (5 allocations: 156.42 KiB)

julia> @btime Zygote.gradient(x -> sum((identity∘tanh).(x)), $xs); # force generic version
  31.097 ms (339593 allocations: 6.56 MiB)
  29.680 ms (309574 allocations: 5.87 MiB)  # this PR, reverse
  131.667 μs (60 allocations: 314.58 KiB)   # this PR, forward
  128.833 μs (52 allocations: 314.17 KiB)   # ... with (y -> tanh(y)) instead

julia> @btime Zygote.gradient(x -> sum(x .> 0.1), $xs)
  10.042 μs (27 allocations: 20.55 KiB)
  8.347 μs (18 allocations: 20.25 KiB)   # this PR, reverse
  12.708 μs (11 allocations: 5.72 KiB)   # this PR, forward
(nothing,)

Edit -- latest version does what's suggested in #336, roughly, marked "forward" above. That is, using dual numbers (via the existing internal function broadcast_forward) instead of generic reverse-mode when possible. Examples linked there discourse & #592 both get about 5x faster.

The lines marked "reverse" above are the result of inference improvements to the generic reverse mode. These were the first step of this PR. And will still be used when dual numbers cannot be, e.g. on complex numbers, or broadcasting closures.

Edit' -- perhaps worth timing map on the same problems. Not certain we should do this here, but the same purity check used for "forward" should also make it safe to skip reverse stuff in map. Which gives modest speedups & some memory savings:

julia> @btime Zygote.gradient(x -> sum(map(abs, x)), $xs);
  53.666 μs (42 allocations: 938.83 KiB)
  25.791 μs (29 allocations: 391.55 KiB)  # this PR

julia> @btime Zygote.gradient(x -> sum(map(cbrt, x)), $xs);
  108.542 μs (42 allocations: 704.44 KiB)
  76.750 μs (29 allocations: 313.42 KiB)  # this PR

julia> @btime Zygote.gradient(x -> sum(map(tanh, x)), $xs);
  149.917 μs (42 allocations: 704.44 KiB)
  124.875 μs (29 allocations: 313.42 KiB)  # this PR

julia> @btime Zygote.gradient(x -> sum(map(identity∘tanh, x)), $xs);
  32.160 ms (300048 allocations: 6.33 MiB)
  31.064 ms (300034 allocations: 5.65 MiB)  # this PR

julia> @btime Zygote.gradient(x -> sum(map(>(0.1), x)), $xs)
  26.565 ms (330034 allocations: 11.07 MiB)
  25.745 ms (330029 allocations: 10.69 MiB)
(nothing,)

Comment on lines +178 to 181
dxs = ntuple(len) do i
collapse_nothings(map(StaticGetter{i}(), dxs_zip))
end
(nothing, accum_sum(dxs[1]), map(unbroadcast, args, Base.tail(dxs))...)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearly map(StaticGetter{i}(), dxs_zip) should really be fused with unbroadcast, possibly into mapreduce.

src/lib/broadcast.jl Outdated Show resolved Hide resolved
Copy link
Member

@oxinabox oxinabox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments but basically seems all sensible.
Once tests pass, merge when happy.

Comment on lines 228 to 231
eltype(out) <: Dual || return (out, _ -> nothing)
y = map(x -> x.value, out)
_back(ȳ, i) = unbroadcast(args[i], ((a, b) -> a*b.partials[i]).(ȳ, out))
back(ȳ) = ntuple(i -> _back(ȳ, i), N)
_back(ȳ, geti) = unbroadcast(geti(args), ((a, b) -> a * geti(b.partials)).(ȳ, out))
back(ȳ) = ntuple(i -> _back(ȳ, StaticGetter{i}()), N)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change for CuArrays, not sure this matters. CI doesn't seem to mind. Pushed to try out on another machine, a bit later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some GPU times: no harm, but no help either.

julia> xs = cu(randn(100_000));

julia> @btime CUDA.@sync Zygote.gradient(x -> sum(cbrt.(x)), $xs);
  119.805 μs (375 allocations: 10.56 KiB)
  119.872 μs (377 allocations: 10.61 KiB) # this PR 

julia> @btime CUDA.@sync Zygote.gradient(x -> sum(tanh.(x)), $xs);  # has its own gradient
  99.938 μs (342 allocations: 10.81 KiB)

julia> @btime CUDA.@sync Zygote.gradient(x -> sum((identity∘tanh).(x)), $xs);  # force generic version
  116.577 μs (372 allocations: 10.52 KiB)
  117.023 μs (371 allocations: 10.52 KiB) # this PR

Also, the generic is as fast as the specific. Remind me why we don't do this on the CPU, and have this 300x slower generic thing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know nothing about how Zygote works with GPU, but taking a guess:
because moving memory between GPU./VRAM and CPU/RAM is even more expensive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite know why the generic broadcast isn't used for the GPU, perhaps something would break. It uses dual numbers instead, and perhaps this makes purity assumptions about the function involved?

If we very simply call that on the CPU, we get times like this:

julia> @btime Zygote.gradient(x -> sum(abs.(x)), $xs);  # earlier 478.458 μs -> 28.625 μs
  15.209 μs (20 allocations: 313.16 KiB)

julia> @btime Zygote.gradient(x -> sum((identity∘tanh).(x)), $xs);  # earlier 29.680 ms
  125.166 μs (20 allocations: 313.16 KiB)

This is with 9227168, which might be too crude... will think a bit & see what CI says.

Copy link
Member Author

@mcabbott mcabbott Jun 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what it gets wrong:

julia> gradient((x,y) -> sum((z->z^2+y[1]).(x)), [1,2,3], [4,5])
([2, 4, 6], nothing)  # this commit
([2, 4, 6], [3, 0])  # tagged Zygote

(now caught & sent the slow path)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite know why the generic broadcast isn't used for the GPU, perhaps something would break. It uses dual numbers instead, and perhaps this makes purity assumptions about the function involved

I think the reason is that you can't send IdDicts to the GPU.
Which means you can't send Zygote.Context to the GPU.
Which means you can't run Zygote._pullback, which needs the Context to track some things, I am not sure what, but I think like tasks and globals?
but the things that the Context is mean to track don't need to be tracked in forwards mode.

src/lib/broadcast.jl Outdated Show resolved Hide resolved
src/lib/broadcast.jl Outdated Show resolved Hide resolved
@ToucheSir
Copy link
Member

With the caveat that I don't really understand what's going on, has this subsumed #807?

@mcabbott
Copy link
Member Author

mcabbott commented Jun 20, 2021

I was unaware of #807. This does not try to make broadcasting better able to handle state; it does try to make it ignore state whenever possible, in the interest of not being slow.

The last few commits add a similar test to map. Trying the test from the issue at the REPL, it fails to spot the state. Which seems not to happen in the tests, since they pass.

julia> s = 0;

julia> function f(x)
            global s += x
          end;

julia> gradient(x -> sum(map(f, x)), 1:10)
([1, 2, 3, 4, 5, 6, 7, 8, 9, 10],)  # this PR
([10, 9, 8, 7, 6, 5, 4, 3, 2, 1],)  # tagged version, test is == (10:-1:1,)

julia> fieldnames(typeof(f))
()

julia> Base.issingletontype(typeof(f))
true

Not sure what to make of that, global scope is weird? Maybe the example is too artificial, this is not actually a sensible gradient calculation.

The examples I invented (and added as tests here) do detect when a variable is closed over:

julia> gradient((x,y) -> sum((z->z^2+y[1]).(x)), [1,2,3], [4,5])
([2, 4, 6], [3, 0])

julia> fieldnames(typeof((z->z^2+y[1])))  # and Base.issingletontype is true
()

julia> let y = [1,2]
         fieldnames(typeof((z->z^2+y[1])))  # and Base.issingletontype is false
       end
(:y,)

Δf_and_args = unzip(_tryreverse($mapfunc, Δf_and_args_zipped))
Δf = reduce(accum, Δf_and_args[1])
(Δf, Δf_and_args[2:end]...)
if Base.issingletontype(F) && length(args) == 1
Copy link
Member Author

@mcabbott mcabbott Jun 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variants of the weird map example from existing tests. Tests pass, but at the REPL it would fail with this PR; some other variants are fine:

julia> s = 0;

julia> function f(x)
            global s += x
          end;

julia> gradient(x -> sum(map(f, x)), 1:10)
([1, 2, 3, 4, 5, 6, 7, 8, 9, 10],)  # this PR
([10, 9, 8, 7, 6, 5, 4, 3, 2, 1],)  # tagged version, test is == (10:-1:1,)

julia> fieldnames(typeof(f))
()

julia> Base.issingletontype(typeof(f))
true

julia> S = Ref(0)  # mutable, global scope
Base.RefValue{Int64}(0)

julia> function F(x)
            S[] += x
          end;

julia> fieldnames(typeof(F))
()

julia> gradient(x -> sum(map(F, x)), 1:10)
([1, 2, 3, 4, 5, 6, 7, 8, 9, 10],)

julia> let
       S2 = Ref(0)  # mutable, local scope
       function F2(x)
           S2[] += x
       end
       gradient(x -> sum(map(F2, x)), 1:10)
       end
([10, 9, 8, 7, 6, 5, 4, 3, 2, 1],)

julia> let
       s2 = 0  # immutable, local scope
       function f2(x)
           s2 += x
       end
       gradient(x -> sum(map(f2, x)), 1:10)
       end
([10, 9, 8, 7, 6, 5, 4, 3, 2, 1],)

My inclination is to say that none of this matters -- the motivation for reverse in stateful map is RNNs, where f is a whole network. That will definitely contain parameters, and hence go the slow path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, probably we could afford a regression in those very corner cases. I don't know why things in the Main module act so differently. Maybe we could exclude methods defined in Main from the fast path? (I don't like this idea too much though)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea there might be a way to test for this, but parentmodule(f2) is also Main so I don't think that's quite right.

But the example seems pretty contrived. We're concerned about getting gradients right, which sometimes depends on evaluation order. This is a naked test of evaluation order, as a proxy for the real concern. Perhaps it's not a close enough one, and we should test slightly more realistic things.

More generally I'm not sure Julia makes guarantees about the order of evaluation of map or broadcast. Especially for broadcast. I'm not too sure what that means for AD yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then I think I'm fine with merging this PR as it is

@mcabbott mcabbott deleted the broadget branch June 25, 2021 21:09
@DhairyaLGandhi
Copy link
Member

@mcabbott please address the concerns brought up and please use bors to merge.

@mcabbott
Copy link
Member Author

What concerns?

@DhairyaLGandhi
Copy link
Member

My comment above would be a start.

@mcabbott
Copy link
Member Author

Yes that's why I asked, I have no idea what that comment was meant to say. Apparently the people tagged also don't know. If you have actual technical concerns and can explain them, or at least provide a MWE, then I'm happy to have a look.

@DhairyaLGandhi
Copy link
Member

How would this support immutable structures like static arrays without redoing work of writing kernels for every kind of array type that comes along?

@mcabbott
Copy link
Member Author

mcabbott commented Jun 26, 2021

Julia just specialises like usual. It's not mutating anything. What makes you think this is less likely to work on immutable arrays than the previous code?

There are right now zero tests for StaticArrays. But when I try things, it seems pretty easy to find examples where broadcasting fails, on the tagged version.

@oschulz
Copy link

oschulz commented Oct 1, 2021

28.625 μs (28 allocations: 469.50 KiB) # this PR, reverse mode
19.375 μs (52 allocations: 314.17 KiB) # this PR, forward mode

@mcabbott , how did you (meaning, how can I :-) ) select forward mode in Zygote - was this done using Zygote.forwarddiff?

@mcabbott
Copy link
Member Author

mcabbott commented Oct 1, 2021

The merged version will automatically use forward mode whenever possible.

The times quoted were from different variants tried while writing this PR -- the first step was map(StaticGetter{i}(), dxs_zip) which improved the generic reverse mode, the second step was to call broadcast_forward when possible (this internal function was previously used only for CUDA stuff).

Applying Zygote.forwarddiff to the whole broadcast will probably be much slower, as it works out the jacobian. While in broadcasting you know that each output only depends on certain inputs (the "sparsity pattern" is how some people like to say it), which broadcast_forward knows about.

@oschulz
Copy link

oschulz commented Oct 1, 2021

Thanks, @mcabbott !

Applying Zygote.forwarddiff to the whole broadcast will probably be much slower

Oh, sure - I meant for the function broadcasted over.

@mcabbott
Copy link
Member Author

mcabbott commented Oct 1, 2021

Ah OK. I never thought to try that. In principle that ought to compile down to roughly the same thing, right? But in practice:

julia> struct Cbrt ignore end;

julia> (::Cbrt)(x) = Zygote.forwarddiff(cbrt, x)

julia> Base.issingletontype(typeof(Cbrt(pi)))  # hence Zygote will not use dual numbers
false

julia> g1 = @btime gradient(xs -> sum(broadcast(Cbrt(pi), xs)), $xs);
  2.143 ms (40035 allocations: 2.90 MiB)

julia> g2 = @btime gradient(xs -> sum(broadcast(cbrt, xs)), $xs);  # default, as above
  90.667 μs (13 allocations: 312.78 KiB)

julia> g1[1] ≈ g2[1]
true

@oschulz
Copy link

oschulz commented Oct 1, 2021

Oh, interesting! The Zygote compiler now outperforms ForwardDiffPullbacks.fwddiff (which I created mainly because Zygote often used to be so slow with broadcasts) in this case:

julia> using Zygote, ForwardDiffPullbacks, BenchmarkTools
julia> xs = randn(10_000)

julia> @benchmark gradient(xs -> sum(broadcast(cbrt, xs)), $xs)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  100.410 μs …  1.835 ms  ┊ GC (min … max): 0.00% … 91.63%
 Time  (median):     105.558 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   118.242 μs ± 91.057 μs  ┊ GC (mean ± σ):  6.04% ±  7.26%

 Memory estimate: 312.78 KiB, allocs estimate: 13.

julia> @benchmark gradient(xs -> sum(broadcast(fwddiff(cbrt), xs)), $xs)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  150.934 μs …  1.446 ms  ┊ GC (min … max): 0.00% … 87.23%
 Time  (median):     155.647 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   166.397 μs ± 61.921 μs  ┊ GC (mean ± σ):  2.00% ±  4.95%

 Memory estimate: 156.48 KiB, allocs estimate: 10.

I guess ForwardDiffPullbacks.fwddiff may still speed things up if the broadcasted function contains calculations that have many steps, in cases where Zygote fails to use Duals automatically (maybe not so often, anymore?). And fwddiff seems to reduce the GC overhead a bit.

@mcabbott
Copy link
Member Author

mcabbott commented Oct 1, 2021

I hadn't seen this package, thanks. Will have to understand what it does, to save memory.

Xref also JuliaDiff/ChainRules.jl#531, where there are some ideas on how to use derivatives_given_output to do better.

For now Zygote uses dual numbers a lot... well the list of conditions is fairly long. The arguments and output must be real numbers. You shouldn't be inside a second derivative. The function must not be a closure, for which Base.issingletontype(typeof(f)) is its (imperfect) test.

broadcasted function contains calculations that have many steps

Do you mean fusion, or just a complicated f in f.(x)?

Zygote remains always un-fused, i.e. f.(g.(x)) is exactly like y=g.(x); f.(y) done one-by-one. But (f∘g).(x) should still use dual numbers.

@oschulz
Copy link

oschulz commented Oct 1, 2021

Looks like for the ForwardDiffPullbacks.fwddiff example, fwddiff still yields a massive speedup:

julia> using Zygote, ForwardDiffPullbacks, StaticArrays, BenchmarkTools

julia> f = (xs...) -> (sum(map(x -> sum(map(x -> x^2, x)), xs)))

julia> xs = (2, (3, 4), SVector(5, 6, 7));
julia> Xs = map(x -> fill(x, 100), xs);

julia> @benchmark Zygote.gradient(($Xs...) -> sum(f.($Xs...)), $Xs...)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  403.472 μs    5.890 ms  ┊ GC (min  max): 0.00%  91.96%
 Time  (median):     452.634 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   474.041 μs ± 294.882 μs  ┊ GC (mean ± σ):  3.51% ±  5.18%

 Memory estimate: 143.06 KiB, allocs estimate: 3229.

julia> @benchmark Zygote.gradient(($Xs...) -> sum(fwddiff($f).($Xs...)), $Xs...)
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min  max):  4.543 μs  393.863 μs  ┊ GC (min  max): 0.00%  95.79%
 Time  (median):     5.010 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.637 μs ±  12.513 μs  ┊ GC (mean ± σ):  8.13% ±  3.62%

 Memory estimate: 9.11 KiB, allocs estimate: 64.

Whew, not all my work was in vain - for now .... ;-)

@oschulz
Copy link

oschulz commented Oct 1, 2021

broadcasted function contains calculations that have many steps
Do you mean fusion, or just a complicated f in f.(x)?

Just an f that has many steps, loops, etc., stuff that will result in a long tape if Zygote reverse-mode diffs it.

@oschulz
Copy link

oschulz commented Oct 1, 2021

I hadn't seen this package, thanks. Will have to understand what it does

ForwardDiffPullbacks.jl basically "just" provides an rrule for a function wrapped with fwddiff. It supports functions with multiple arguments, in contrast to ForwardDiff.jl itself, but it's currently limited to arguments that are real numbers, tuples or static vectors. I should add support for NamedTuple and maybe Complex and so on. I'd like to add support for arrays as well at some point (won't be efficient for large arrays, of course).

The main idea behind ForwardDiffPullbacks.jl is to allow users to force forward-mode AD, compatible with anything the respects rrule, and with fairly lightweight dependencies (no dependency on Zygote, etc.). The main use case would be forcing forward-mode for functions with few degrees of freedom but involved/many-step calculations and functions that are not Zygote/Diffractor compatible (because the use mutation or so).

@mcabbott
Copy link
Member Author

mcabbott commented Oct 1, 2021

I see, so I should think of this as being a bit like Zygote.forwarddiff, but accepting different types. I think.

for now .... ;-)

Well I certainly have no plans to make Zygote's broadcasting subsume this, doing stuff beyond arrays of numbers I mean. But I would like to figure out how broadcast(fwddiff(cbrt), xs) gets away with less memory.

Xref also JuliaDiff/ChainRulesDeclarationHelpers.jl#2 , not the same but perhaps of interest.

@mcabbott
Copy link
Member Author

mcabbott commented Oct 2, 2021

Will have to understand what it does, to save memory.

OK now I see -- it looks like it evaluates twice, which avoids having to save the partials between the forward pass and the backward.

julia> using Zygote, ForwardDiffPullbacks

julia> pr(x) = @show x;

julia> gradient(fwddiff(pr), 2pi)
x = 6.283185307179586
x = Dual{ForwardDiff.Tag{Tuple{typeof(pr), Val{1}}, Float64}}(6.283185307179586,1.0)
(1.0,)

julia> gradient(x -> Zygote.forwarddiff(pr,x), 2pi)
x = Dual{Nothing}(6.283185307179586,1.0)
(1.0,)

@oschulz
Copy link

oschulz commented Oct 2, 2021

it looks like it evaluates twice

Yes - it's a bit wasteful, but the forward pass is without dual numbers and the reverse pass uses separate duals for each thunk. I should optimize that, at least for the case where there's only one arguments.

@oschulz
Copy link

oschulz commented Oct 2, 2021

I'm very open to changes/suggestions/PRs regarding ForwardDiffPullbacks - I'd be glad to move it to the JuliaDiff GitHub org once it's considered solid enough.

@mcabbott
Copy link
Member Author

mcabbott commented Oct 2, 2021

For broadcasting it seems tricky to guess how to make this trade-off, in general. Some functions are more expensive than others. But for an explicit wrapper like this, you know what you're getting... I suppose it could in principle offer an option to specify whether to re-compute or not.

For numbers, as you say, a chunked calculation seems like it should not have downsides.

julia> pr(x, y) = (@show x y; x/y);

julia> gradient(fwddiff(pr), 2pi, 4)
x = 6.283185307179586
y = 4
x = Dual{ForwardDiff.Tag{Tuple{typeof(pr), Val{1}}, Float64}}(6.283185307179586,1.0)
y = 4
x = 6.283185307179586
y = Dual{ForwardDiff.Tag{Tuple{typeof(pr), Val{2}}, Int64}}(4,1)
(0.25, -0.39269908169872414)

julia> gradient((x,y) -> Zygote.forwarddiff(Base.splat(pr), [x,y]), 2pi, 4)
x = Dual{Nothing}(6.283185307179586,1.0,0.0)
y = Dual{Nothing}(4.0,0.0,1.0)
(0.25, -0.39269908169872414)

@cossio cossio mentioned this pull request Dec 10, 2021
cossio pushed a commit to cossio/RestrictedBoltzmannMachines.jl that referenced this pull request Dec 10, 2021
Zygote uses ForwardDiff for differentiating broadcasting operations.
See FluxML/Zygote.jl#1001.
However ForwardDiff cannot differentiate non-generic code that accepts only Float64
(it needs to pass its Dual type).
In addition ForwardDiff defines rules via DiffRules. It doesn't understand ChainRules.
This means we need to define some rules here to make things work.

See discussion here to understand how to add new rules for ForwardDiff:
https://discourse.julialang.org/t/issue-with-forwarddiff-custom-ad-rule/72886/6

If JuliaDiff/DiffRules.jl#74 gets merged I should not need
this anymore.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants