-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport new CUDNN interface to 1.5 #635
Conversation
d031cc9
to
e47204a
Compare
Codecov Report
@@ Coverage Diff @@
## compat #635 +/- ##
==========================================
+ Coverage 77.30% 78.94% +1.64%
==========================================
Files 116 120 +4
Lines 6438 6650 +212
==========================================
+ Hits 4977 5250 +273
+ Misses 1461 1400 -61
Continue to review full report at Codecov.
|
Last tagged Flux's tests fail on this branch: ERROR: LoadError: LoadError: LoadError: UndefVarError: RNNDesc not defined
Stacktrace:
[1] top-level scope at /home/carlo/.julia/packages/Flux/YO4zY/src/cuda/curnn.jl:8
[2] include at ./Base.jl:368 [inlined]
[3] include(::String) at /home/carlo/.julia/packages/Flux/YO4zY/src/cuda/cuda.jl:1
[4] top-level scope at /home/carlo/.julia/packages/Flux/YO4zY/src/cuda/cuda.jl:6
[5] include at ./Base.jl:368 [inlined]
[6] include(::String) at /home/carlo/.julia/packages/Flux/YO4zY/src/Flux.jl:1
[7] top-level scope at /home/carlo/.julia/packages/Flux/YO4zY/src/Flux.jl:51
[8] top-level scope at none:2
[9] eval at ./boot.jl:331 [inlined]
in expression starting at /home/carlo/.julia/packages/Flux/YO4zY/src/cuda/curnn.jl:8
in expression starting at /home/carlo/.julia/packages/Flux/YO4zY/src/cuda/cuda.jl:6
in expression starting at /home/carlo/.julia/packages/Flux/YO4zY/src/Flux.jl:51
ERROR: LoadError: Failed to precompile Flux [587475ba-b771-5e3f-ad9e-33799f191a9c] to /home/carlo/.julia/compiled/v1.5/Flux/QdkVy_aa1Cd.ji.
Stacktrace:
[1] include(::String) at ./client.jl:457
[2] top-level scope at none:6
in expression starting at /home/carlo/.julia/packages/Flux/YO4zY/test/runtests.jl:1
ERROR: Package Flux errored during testing because we used to rely on the RNNDesc: function CUDNN.RNNDesc(m::CuRNNs{T}) where T
h, i = length(m.h), size(m.Wi, 2)
mode = m isa CuRNN ?
(m.σ == tanh ? CUDNN.CUDNN_RNN_TANH : CUDNN.CUDNN_RNN_RELU) :
m isa CuGRU ? CUDNN.CUDNN_GRU : CUDNN.CUDNN_LSTM
r = CUDNN.RNNDesc{T}(mode, i, h)
return r
end
const descs = WeakKeyDict()
function desc(rnn)
d = haskey(descs, rnn) ? descs[rnn] : (descs[rnn] = CUDNN.RNNDesc(rnn))
CUDNN.setweights!(d, rnn.Wi, rnn.Wh, rnn.b)
return d
end
function (m::CuRNN{T})(h::CuArray{T}, x::CuArray{T}) where T <: Union{Float32,Float64}
y, h′ = CUDNN.forward(desc(m), x, h)
return h′, y
end This code was likely removed by @jeremiedb and tests are passing on Flux's master. |
One of the things that prompted me to write the new cudnn interface was the incomplete rnn interface: #343. RNNDesc was a badly designed interface that did not support many of the functionalities of cudnn rnn API (no multiple layers, bidirectional networks etc if I understand correctly). The new call |
Can a compatibility layer between the old RNNDesc based interface and the new one be kept in CUDA.jl, at least here in the backport? Otherwise, the next julia-1.5 compatible release will have to be marked as breaking |
Certainly. I can try to write something next week if nobody else gets to it before me. |
Regarding the RNN design, I'm unclear whether the plugging back the CUDNN would currently be right course given the differently designed approach taken in Flux, which only defines recurrent cells at the individual level. This results in a model being broadcasting on a sequence So under current Flux approach to RNN, the CUDNN functionalities were not really leveraged as it was called for each individual step of the sequence, which explained from my understanding why resorting to base CUDA.jl implementation resulted in similar performance (along solving some pending issues). All this to say that while I see as desirable to have an approach to RNN that matches that of CUDNN with 3D input, I think it would involve some departure from the "single step" design. Otherwise, I'm not sure there's any benefit integrating the CUDNN RNN if the Flux RNN design isn't moved to that 3D structure. @denizyuret, was your intent to revamp Flux RNN in that direction? |
That's interesting. Though a more appropriate Julia metaphor for RNNs would be not
This is not surprising because most of the gain for a GPU rnn comes from being able to process the whole sequence in parallel.
I think for research use it is essential to support cudnn features like 3D input, multiple layers, easy bidirectionality etc. A "single step" design is useful for demonstration and teaching but not for most large scale research. NNlib should probably extended with such an interface that also has the supporting data structures like PaddedSequence etc. Flux can choose to support both the old and the new interfaces. As to how to proceed in Flux I am not sure how the decision process works. Should @DhairyaLGandhi or @CarloLucibello have the final say? As an example I have had an RNN interface in Knet that supports 3D, multilayer, bidirectional etc. since cudnn introduced them: https://github.com/denizyuret/Knet.jl/blob/master/src/ops20/rnn.jl Though nobody is completely happy with my interface, @ekinakyurek wrote another: https://github.com/ekinakyurek/KnetLayers.jl/blob/master/src/rnn.jl These days I am looking at PyTorch/TensorFlow/MXnet/ONNX etc to see what different API ideas are out there, and probably will design a new interface under Knet.Layers21, keeping the old one for backward compatibility. |
I'm not sure I understood correctly the The above scenario is effectively supporting the need for having a RNN operator on 3D input that results in the full sequence array output. That being said, current single approach performance isn't terrible and I can see benefits of it for certain use cases given its flexibility. It already well supports multiple layers To follow the general development pattern, should we have the current RNN cpu implementation moved from Flux to NNlib? As for the compatibility between CPU and GPU, a simple fix could be keep current Flux RNN implementation and dispatch for 3D input by performing the mapping of the recurrent model over each of the seq_length dimension. I noticed in the linked implementation that embedding layer was even included. I think the level of high-levelness for Flux is subject to some debate, though I think it falls more on the lean side. As such, if I'm not mistaken, I think that Flax/jax doesn't even have those fused CUDNN RNN operator, only the cell operations. I'd be curious to have the take from @DhairyaLGandhi/@CarloLucibello on where to cut the line on the fundamental building blocks to include. |
I think @denizyuret's point is that
AIUI XLA will fuse the cell operations, so there's no need to define a manually fused kernel. I assume this level of fusion doesn't happen in Flux's RNN cells, but would have to compare HLO vs Julia IR to be sure. |
I agree that current broadcasting/map gives a little off feeling like "let's hope it will iterate in order" :) foldl((a, b) -> cell(a, b)[1], x, init=cell.state0)` The semantic feels better, however I miss how it could provides with the output of each of the sequence step rather than just the final hidden state. The current Anyhow, If @denizyuret you have a clear view on how to provide a CPU support compatible with the CUDNN RNN interface, that would be great, otherwise I can take a shot at leveraging current Flux implementation to support the 3D input & bidirectional. |
A better API design that can utilize the complete CUDNN RNN interface with a CPU implementation would take significantly more time than just providing backward compatibility in Flux, which is the only thing blocking the CUDA.jl release right now. My suggestion would be:
To give an idea about the complexity, this link I have shared is an (almost) complete CPU implementation of the CUDNN RNN interface: https://github.com/denizyuret/Knet.jl/blob/master/src/ops20/rnn.jl It supports 1-D (single-step), 2-D (single-step with batch), and 3-D (sequence batch) inputs, multiple layers, bidirectionality etc. So it would allow you to use single-step with broadcasting with, e.g. |
Thanks, that sounds a good plan. Regarding:
I'm not sure what's holding this backport to be compatible with current Flux. Given the |
Flux master is fine, but the next release v0.12 will take some time since we are pipelining a few breaking changes. |
e47204a
to
80c846f
Compare
This seems to be taking longer than expected, so I don't think we'll be backporting this to a 1.5 release. Let's keep the discussion in #523 then. |
It's going to be hard for flux to keep supporting julia 1.5 without this backport. And in general, it seems hard to keep up with CUDA.jl aggressive dropping of older julia versions. @maleadt was julia 1.6 kind of special in this regard and things expected to be more retro-fittable in the future? otherwise, we could think about excising cudnn and nnlib's stuff, although it would be a little annoying and contrary to the consolidation direction that gpu packages have gone recently. |
I'm happy to include this in a 1.5 backport release, but it seemed risky wrt. breaking or slowing down existing Flux.jl releases. About the Julia compatibility: CUDA.jl (and other GPU back-ends) is no "ordinary" package, but extends the Julia compiler and as such is much more sensitive to internal changes. In 1.6, for example, the entire AOT compilation API changed completely. It's very hard to support multiple releases that way, and I don't expect that to improve much. We've been trying to "containerize" that functionality in GPUCompiler.jl, but because that package's API hasn't matured yet changes in its Julia support quickly end up limiting the support of downstream packages like CUDA.jl. |
@CarloLucibello A backport release of CUDA.jl for Julia 1.5 including the new CUDNN API is up on the |
thanks! I think the current Flux releases (v0.11) are not compatible with that because of the change in the RNN descriptors. |
If there's an actual incompatibility, the release needs to be breaking and we can't backport this to a 1.5-compatible CUDA.jl release. The next release will be a breaking version, 3.0, but only for Julia 1.6. |
yep, #523 breaks the RNN interface without a deprecation path. That's the only breaking part of that pr I think. It leaked on both
|
All of this seems to suggest that at some point we should factor out a NNlibCUDA package, either here as a subpackage (so that you can keep an eye on it and manage it easily) or somewhere else. |
This is a single file that can be found in the last CUDA.jl release, can't we just copy and paste it to Flux or revive it in CUDA.jl for now until the interface is fixed?
|
Yes, I have been advocating this too. CUDA/lib/cudnn can be a light wrapper of cudnn library calls and what we do with them can move to their own packages (Knet.CUDNN, NNlibCUDA etc). |
The interface is basically there already, we have to bring back the cudnn bit from earlier, and we will be able to dispatch back into cudnn |
I just copied it to #738 As far as I can tell it doesn't break or conflict with anything in the new cudnn interface. See if it fixes Flux. |
This should be easier to test with current packages. Fixes should be pushed to #523. If this PR is fine, we can go ahead and merge #523 too. cc @denizyuret @DhairyaLGandhi @CarloLucibello