Skip to content

Commit d031cc9

Browse files
denizyuretmaleadt
authored andcommitted
New high level interface for cuDNN
1 parent 2d5700c commit d031cc9

37 files changed

+3376
-1179
lines changed

lib/cudnn/CUDNN.jl

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,20 +19,24 @@ include("libcudnn_deprecated.jl")
1919
# low-level wrappers
2020
include("util.jl")
2121
include("base.jl")
22+
include("descriptors.jl")
2223
include("tensor.jl")
23-
include("conv.jl")
24+
include("inplace.jl")
25+
include("optensor.jl")
26+
include("reduce.jl")
27+
include("convolution.jl")
2428
include("pooling.jl")
2529
include("activation.jl")
26-
include("filter.jl")
2730
include("softmax.jl")
28-
include("batchnorm.jl")
2931
include("dropout.jl")
3032
include("rnn.jl")
33+
include("multiheadattn.jl")
34+
include("normalization.jl")
3135

3236
# high-level integrations
3337
include("nnlib.jl")
38+
include("batchnorm.jl")
3439

35-
include("compat.jl")
3640

3741
function math_mode(mode=CUDA.math_mode())
3842
if mode == CUDA.PEDANTIC_MATH

lib/cudnn/README.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
## High level interface to cuDNN functions
2+
Deniz Yuret, Nov 6, 2020
3+
4+
The goal of the high-level interface is to map the low level cuDNN calls to more natural
5+
Julia functions. Here are some design choices I followed:
6+
7+
**Naming:** We try to keep the same function, argument, and type names from the cuDNN
8+
library in the high level interface. The wrappers for descriptors drop the `_t` suffix,
9+
e.g. `cudnnPoolingDescriptor_t => cudnnPoolingDescriptor`.
10+
11+
**Descriptors:** The cuDNN functions take data and operator descriptors. Most of these
12+
descriptors are relatively fast to create (~500 ns for a cudnnTensorDescriptor) so they may
13+
not be worth preallocating for the user but we provide keyword options anyway. We cache
14+
descriptors (~100 ns) so we can use them as hash keys for memoization, which also saves a
15+
bit of memory and speed. All descriptor fields are `isbits` types with the exception of the
16+
`cudnnDropoutDescriptor` which points to a random number generator state and is used as a
17+
field of some other descriptors.
18+
19+
**Operator descriptors:** Descriptors such as `cudnnPoolingDescriptor` specify the options
20+
for an operator such as stride and padding. For operators with descriptors we have one
21+
method that takes keyword arguments with reasonable defaults to construct the descriptor and
22+
another method that takes a pre-initialized descriptor as its last argument. This way a
23+
casual user can call the first method without worrying about the descriptor format, only
24+
specifying non-default options, whereas a layer architect can keep a preset descriptor in
25+
the layer that gets passed to the function using the second method. We try to use generic
26+
Julia types for keyword arguments that specify default descriptor fields and convert these
27+
to the appropriate cudnn types during descriptor construction.
28+
29+
**Output arrays:** The low level cuDNN functions take pre-allocated output arrays. The high
30+
level interface has one Julia function that allocates its own output array
31+
(e.g. `cudnnPoolingForward`) and another with an exclamation mark that takes a pre-allocated
32+
output array as its first argument (e.g. `cudnnPoolingForward!`).
33+
34+
**Methods:** Each cuDNN forward function may have up to four methods depending on whether
35+
the descriptor and the output array are specified:
36+
37+
cudnnPoolingForward(x; kwargs...)
38+
cudnnPoolingForward(x, d::cudnnPoolingDescriptor; kwargs...)
39+
cudnnPoolingForward!(y, x; kwargs...)
40+
cudnnPoolingForward!(y, x, d::cudnnPoolingDescriptor; kwargs...)
41+
42+
The conventional order of arguments for these public methods is:
43+
44+
([output], weights, inputs, [descriptor]; kwargs...)
45+
46+
**AD method:** Neither the high level nor the low level interface is sometimes
47+
appropriate for gradient definitions, e.g. the low level API may not return a value, the
48+
high level API may have some gradient target parameters as keyword arguments. To solve this
49+
issue the API exposes an intermediate function with an AD suffix,
50+
e.g. `cudnnPoolingForwardAD`, that is called by the high level method and that makes
51+
the low level library call. These methods may not seem like they are doing anything useful,
52+
but they should not be removed so automatic gradient packages may make use of them.
53+
54+
**Backward functions:** The point of a high level interface is to give the user appropriate
55+
defaults for the many options of typical cudnn functions. Backward functions do not have
56+
meaningful defaults because they need to copy their options from the corresponding forward
57+
function. Therefore we do not need high level APIs for backward functions unless they are
58+
useful in some other way. See Knet/src/cudnn for example uses.
59+
60+
**Types:** Do not specify types for array arguments. Leave the high level functions generic
61+
so they can be called with CuArray, KnetArray, AutoGrad.Param etc. Types can and should be
62+
specified for non-array arguments. In the API we use `nothing` to indicate unspecified array
63+
argument values, convert these to `C_NULL` or `CU_NULL` as appropriate only at the low-level
64+
call. Similarly for numbers the API should accept generic types like `Integer` or `Real` and
65+
convert these to the appropriate specific type, e.g. `Cint` or `Cdouble` only at the
66+
low-level call.
67+
68+
**Workspace:** Some functions need a temporary allocated workspace whose required size is
69+
determined by another cudnn call. Unfortunately, the required size may depend on factors
70+
other than the current inputs (see [this
71+
issue](https://github.com/FluxML/Flux.jl/issues/923#issuecomment-558671966)), so the usage
72+
of the `@workspace` macro is used at a point as close to the library call as possible. One
73+
exception to this is cases where the same workspace will be passed to the backward call, in
74+
which case we allocate a regular CuArray.
75+
76+
**Training vs Inference:** There is no consistent way cuDNN distinguishes training vs inference calls:
77+
* BatchNormalization and Normalization have two separate functions: `cudnnNormalizationForwardTraining / Inference`
78+
* RNN has an indicator argument: `fwdMode` in `cudnnRNNForward`
79+
* MultiHeadAttn looks at the `reserveSpace` argument to decide: if `NULL` inference mode, otherwise training mode
80+
* Dropout always runs in training mode with a non-NULL `reserveSpace` (it doesn't make sense in inference mode)
81+
* Activation, convolution, pooling, softmax, optensor, addtensor, reducetensor do not make a distinction between the two modes
82+
83+
In the high level API we assume inference by default and let the gradient packages override when necessary.
84+
See the gradient implementations in Knet/src/cudnn for examples.
85+
86+
**TODO:**
87+
* Keyword arg descriptor constructors.
88+
* Test forw fns with descriptors: check for desc vs kwarg incompatibility.
89+
* Find out about cudnnRNNSetClip_v8.
90+
* Test with Knet.Ops20.
91+
* Command used to test: julia17 --project -e 'using Pkg; Pkg.API.test(; test_args=`--memcheck --jobs=1 cudnn`)'

lib/cudnn/activation.jl

Lines changed: 50 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,57 @@
1-
# descriptor
2-
3-
mutable struct ActivationDesc
4-
ptr::cudnnActivationDescriptor_t
1+
"""
2+
cudnnActivationForward(x; mode, nanOpt, coef, alpha)
3+
cudnnActivationForward(x, d::cudnnActivationDescriptor; alpha)
4+
cudnnActivationForward!(y, x; mode, nanOpt, coef, alpha, beta)
5+
cudnnActivationForward!(y, x, d::cudnnActivationDescriptor; alpha, beta)
6+
7+
Return the result of the specified elementwise activation operation applied to `x`.
8+
Optionally `y` holds the result and `d` specifies the operation. `y` should be similar to
9+
`x` if specified. Keyword arguments `alpha=1, beta=0` can be used for scaling, i.e. `y .=
10+
alpha*op.(x1) .+ beta*y`. The following keyword arguments specify the operation if `d` is
11+
not given:
12+
13+
* `mode = CUDNN_ACTIVATION_RELU`: Options are SIGMOID, RELU, TANH, CLIPPED_RELU, ELU, IDENTITY
14+
* `nanOpt = CUDNN_NOT_PROPAGATE_NAN`: NAN propagation policy, the other option is `CUDNN_PROPAGATE_NAN`
15+
* `coef=1`: When the activation mode is set to CUDNN_ACTIVATION_CLIPPED_RELU, this input specifies the clipping threshold; and when the activation mode is set to CUDNN_ACTIVATION_ELU, this input specifies the α parameter.
16+
"""
17+
cudnnActivationForward, cudnnActivationForward!
18+
19+
20+
# Public methods
21+
cudnnActivationForward(x; o...) = cudnnActivationForwardWithDefaults(x; o...)
22+
cudnnActivationForward!(y, x; o...) = cudnnActivationForwardWithDefaults(x; y, o...)
23+
cudnnActivationForward(x, d::cudnnActivationDescriptor; o...) = cudnnActivationForwardWithDefaults(x; activationDesc=d, o...)
24+
cudnnActivationForward!(y, x, d::cudnnActivationDescriptor; o...) = cudnnActivationForwardWithDefaults(x; y, activationDesc=d, o...)
25+
26+
27+
# Private method
28+
function cudnnActivationForwardWithDefaults(
29+
x;
30+
y = similar(x),
31+
mode::cudnnActivationMode_t = CUDNN_ACTIVATION_RELU,
32+
nanOpt::cudnnNanPropagation_t = CUDNN_NOT_PROPAGATE_NAN,
33+
coef::Real=1,
34+
activationDesc::cudnnActivationDescriptor = cudnnActivationDescriptor(mode, nanOpt, Cdouble(coef)),
35+
alpha::Real=1,
36+
beta::Real=0,
37+
xDesc::cudnnTensorDescriptor = cudnnTensorDescriptor(x),
38+
yDesc::cudnnTensorDescriptor = xDesc,
39+
)
40+
T = eltype(x)
41+
alpha, beta = scalingParameter(T,alpha), scalingParameter(T,beta)
42+
cudnnActivationForwardAD(x; activationDesc, alpha, xDesc, beta, yDesc, y)
543
end
644

7-
unsafe_free!(ad::ActivationDesc)=cudnnDestroyActivationDescriptor(ad.ptr)
8-
9-
Base.unsafe_convert(::Type{cudnnActivationDescriptor_t}, ad::ActivationDesc)=ad.ptr
1045

11-
function ActivationDesc(mode, coeff, reluNanOpt=CUDNN_NOT_PROPAGATE_NAN)
12-
ad = Ref{cudnnActivationDescriptor_t}()
13-
cudnnCreateActivationDescriptor(ad)
14-
cudnnSetActivationDescriptor(ad[],mode,reluNanOpt,coeff)
15-
this = ActivationDesc(ad[])
16-
finalizer(unsafe_free!, this)
17-
return this
46+
# AD method:
47+
function cudnnActivationForwardAD(x; activationDesc, alpha, xDesc, beta, yDesc, y)
48+
cudnnActivationForward(handle(), activationDesc, alpha, xDesc, x, beta, yDesc, y)
49+
return y
1850
end
1951

2052

21-
# wrappers
22-
23-
function cudnnActivationForward(x::DenseCuArray{T,N}, y::DenseCuArray{T,N}=x;
24-
mode=CUDNN_ACTIVATION_RELU, # CUDNN_ACTIVATION_IDENTITY will not work
25-
coeff=false, reluNanOpt=CUDNN_NOT_PROPAGATE_NAN, alpha=true,
26-
beta=false) where {T,N}
27-
cudnnActivationForward(handle(), ActivationDesc(mode, T(coeff), reluNanOpt),
28-
scalingParameter(T, alpha), TensorDesc(x), x,
29-
scalingParameter(T, beta ), TensorDesc(y), y)
30-
return y
31-
end
32-
33-
function cudnnActivationBackward(x::DenseCuArray{T,N}, dx::DenseCuArray{T,N},
34-
y::DenseCuArray{T,N}, dy::DenseCuArray{T,N}=dx;
35-
mode=CUDNN_ACTIVATION_RELU, # CUDNN_ACTIVATION_IDENTITY will not work
36-
coeff=false, reluNanOpt=CUDNN_NOT_PROPAGATE_NAN, alpha=1,
37-
beta=false) where {T,N}
38-
cudnnActivationBackward(handle(), ActivationDesc(mode, T(coeff), reluNanOpt),
39-
scalingParameter(T, alpha), TensorDesc( y), y,
40-
TensorDesc(dy), dy,
41-
TensorDesc( x), x,
42-
scalingParameter(T, beta ), TensorDesc(dx), dx)
43-
return dx
53+
# Deprecated:
54+
function cudnnActivationForward(x::DenseCuArray{T,N}, y::DenseCuArray{T,N}; o...) where {T,N}
55+
@warn "`cudnnActivationForward(x,y)` is deprecated, please use one of the methods in `@doc cudnnActivationForward`." maxlog=1
56+
cudnnActivationForward!(y, x; o...)
4457
end

lib/cudnn/batchnorm.jl

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,9 @@ function cudnnBNForward!(y::DenseCuArray{T}, g::DenseCuArray{T}, b::DenseCuArray
3636
# warn("eps ",eps," is too small for CuDNN so eps has been assigned the value ", CUDNN_BN_MIN_EPSILON)
3737
eps = CUDNN_BN_MIN_EPSILON
3838
end
39-
xd = TensorDesc(x)
40-
yd = TensorDesc(y)
41-
gd = TensorDesc(T, dims)
39+
xd = cudnnTensorDescriptor(x)
40+
yd = cudnnTensorDescriptor(y)
41+
gd = cudnnTensorDescriptor(CUDNN_TENSOR_NCHW, cudnnDataType(T), Cint(length(dims)), dim4(dims,Val(CUDNN_TENSOR_NCHW)))
4242

4343
if training
4444

@@ -91,10 +91,10 @@ function cudnnBNBackward!(dg::DenseCuArray{T}, g::DenseCuArray{T}, db::DenseCuAr
9191
alpha = T(1), beta = T(0),
9292
dalpha = T(1), dbeta = T(0), training = true) where T<:Union{Float32, Float64}
9393
if training
94-
xd = TensorDesc(x)
95-
dyd = TensorDesc(dy)
96-
dxd = TensorDesc(dx)
97-
gd = TensorDesc(T, _wsize(x))
94+
xd = cudnnTensorDescriptor(x)
95+
dyd = cudnnTensorDescriptor(dy)
96+
dxd = cudnnTensorDescriptor(dx)
97+
gd = cudnnTensorDescriptor(CUDNN_TENSOR_NCHW, cudnnDataType(T), Cint(length(_wsize(x))), dim4(_wsize(x),Val(CUDNN_TENSOR_NCHW)))
9898
if cache !== nothing
9999
mean, ivar = cache.mean, cache.ivar
100100
info("mean and ivar are fetched from the cache")

lib/cudnn/compat.jl

Lines changed: 0 additions & 21 deletions
This file was deleted.

0 commit comments

Comments
 (0)