-
-
Notifications
You must be signed in to change notification settings - Fork 611
Add @autosize
#2078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add @autosize
#2078
Changes from all commits
ac34df9
604f2b4
46e06c7
310b71e
537b011
b2016e2
e2ab1ec
67ea6a7
936bb5b
5c1ed68
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,47 +1,79 @@ | ||
# Shape Inference | ||
|
||
To help you generate models in an automated fashion, [`Flux.outputsize`](@ref) lets you | ||
calculate the size returned produced by layers for a given size input. | ||
This is especially useful for layers like [`Conv`](@ref). | ||
Flux has some tools to help generate models in an automated fashion, by inferring the size | ||
of arrays that layers will recieve, without doing any computation. | ||
This is especially useful for convolutional models, where the same [`Conv`](@ref) layer | ||
accepts any size of image, but the next layer may not. | ||
|
||
It works by passing a "dummy" array into the model that preserves size information without running any computation. | ||
`outputsize(f, inputsize)` works for all layers (including custom layers) out of the box. | ||
By default, `inputsize` expects the batch dimension, | ||
but you can exclude the batch size with `outputsize(f, inputsize; padbatch=true)` (assuming it to be one). | ||
The higher-level tool is a macro [`@autosize`](@ref) which acts on the code defining the layers, | ||
and replaces each appearance of `_` with the relevant size. This simple example returns a model | ||
with `Dense(845 => 10)` as the last layer: | ||
|
||
Using this utility function lets you automate model building for various inputs like so: | ||
```julia | ||
""" | ||
make_model(width, height, inchannels, nclasses; | ||
layer_config = [16, 16, 32, 32, 64, 64]) | ||
@autosize (28, 28, 1, 32) Chain(Conv((3, 3), _ => 5, relu, stride=2), Flux.flatten, Dense(_ => 10)) | ||
``` | ||
|
||
The input size may be provided at runtime, like `@autosize (sz..., 1, 32) Chain(Conv(`..., but all the | ||
layer constructors containing `_` must be explicitly written out -- the macro sees the code as written. | ||
|
||
This macro relies on a lower-level function [`outputsize`](@ref Flux.outputsize), which you can also use directly: | ||
|
||
```julia | ||
c = Conv((3, 3), 1 => 5, relu, stride=2) | ||
Flux.outputsize(c, (28, 28, 1, 32)) # returns (13, 13, 5, 32) | ||
``` | ||
|
||
Create a CNN for a given set of configuration parameters. | ||
The function `outputsize` works by passing a "dummy" array into the model, which propagates through very cheaply. | ||
It should work for all layers, including custom layers, out of the box. | ||
|
||
# Arguments | ||
- `width`: the input image width | ||
- `height`: the input image height | ||
- `inchannels`: the number of channels in the input image | ||
- `nclasses`: the number of output classes | ||
- `layer_config`: a vector of the number of filters per each conv layer | ||
An example of how to automate model building is this: | ||
```jldoctest; output = false, setup = :(using Flux) | ||
""" | ||
function make_model(width, height, inchannels, nclasses; | ||
layer_config = [16, 16, 32, 32, 64, 64]) | ||
# construct a vector of conv layers programmatically | ||
conv_layers = [Conv((3, 3), inchannels => layer_config[1])] | ||
for (infilters, outfilters) in zip(layer_config, layer_config[2:end]) | ||
push!(conv_layers, Conv((3, 3), infilters => outfilters)) | ||
make_model(width, height, [inchannels, nclasses; layer_config]) | ||
|
||
Create a CNN for a given set of configuration parameters. Arguments: | ||
- `width`, `height`: the input image size in pixels | ||
- `inchannels`: the number of channels in the input image, default `1` | ||
- `nclasses`: the number of output classes, default `10` | ||
- Keyword `layer_config`: a vector of the number of channels per layer, default `[16, 16, 32, 64]` | ||
""" | ||
function make_model(width, height, inchannels = 1, nclasses = 10; | ||
layer_config = [16, 16, 32, 64]) | ||
# construct a vector of layers: | ||
conv_layers = [] | ||
push!(conv_layers, Conv((5, 5), inchannels => layer_config[1], relu, pad=SamePad())) | ||
for (inch, outch) in zip(layer_config, layer_config[2:end]) | ||
push!(conv_layers, Conv((3, 3), inch => outch, sigmoid, stride=2)) | ||
end | ||
|
||
# compute the output dimensions for the conv layers | ||
# use padbatch=true to set the batch dimension to 1 | ||
conv_outsize = Flux.outputsize(conv_layers, (width, height, nchannels); padbatch=true) | ||
# compute the output dimensions after these conv layers: | ||
conv_outsize = Flux.outputsize(conv_layers, (width, height, inchannels); padbatch=true) | ||
|
||
# the input dimension to Dense is programatically calculated from | ||
# width, height, and nchannels | ||
return Chain(conv_layers..., Dense(prod(conv_outsize) => nclasses)) | ||
# use this to define appropriate Dense layer: | ||
last_layer = Dense(prod(conv_outsize) => nclasses) | ||
return Chain(conv_layers..., Flux.flatten, last_layer) | ||
end | ||
|
||
m = make_model(28, 28, 3, layer_config = [9, 17, 33, 65]) | ||
|
||
Flux.outputsize(m, (28, 28, 3, 42)) == (10, 42) == size(m(randn(Float32, 28, 28, 3, 42))) | ||
|
||
# output | ||
|
||
true | ||
``` | ||
|
||
Alternatively, using the macro, the definition of `make_model` could end with: | ||
|
||
``` | ||
# compute the output dimensions & construct appropriate Dense layer: | ||
return @autosize (width, height, inchannels, 1) Chain(conv_layers..., Flux.flatten, Dense(_ => nclasses)) | ||
end | ||
``` | ||
|
||
### Listing | ||
|
||
```@docs | ||
Flux.@autosize | ||
Flux.outputsize | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -147,8 +147,12 @@ outputsize(m::AbstractVector, input::Tuple...; padbatch=false) = outputsize(Chai | |
|
||
## bypass statistics in normalization layers | ||
|
||
for layer in (:LayerNorm, :BatchNorm, :InstanceNorm, :GroupNorm) | ||
@eval (l::$layer)(x::AbstractArray{Nil}) = x | ||
for layer in (:BatchNorm, :InstanceNorm, :GroupNorm) # LayerNorm works fine | ||
@eval function (l::$layer)(x::AbstractArray{Nil}) | ||
l.chs == size(x, ndims(x)-1) || throw(DimensionMismatch( | ||
string($layer, " expected ", l.chs, " channels, but got size(x) == ", size(x)))) | ||
x | ||
end | ||
end | ||
|
||
## fixes for layers that don't work out of the box | ||
|
@@ -168,3 +172,162 @@ for (fn, Dims) in ((:conv, DenseConvDims),) | |
end | ||
end | ||
end | ||
|
||
|
||
""" | ||
@autosize (size...,) Chain(Layer(_ => 2), Layer(_), ...) | ||
|
||
Returns the specified model, with each `_` replaced by an inferred number, | ||
for input of the given `size`. | ||
|
||
The unknown sizes are usually the second-last dimension of that layer's input, | ||
which Flux regards as the channel dimension. | ||
(A few layers, `Dense` & [`LayerNorm`](@ref), instead always use the first dimension.) | ||
The underscore may appear as an argument of a layer, or inside a `=>`. | ||
It may be used in further calculations, such as `Dense(_ => _÷4)`. | ||
|
||
# Examples | ||
``` | ||
julia> @autosize (3, 1) Chain(Dense(_ => 2, sigmoid), BatchNorm(_, affine=false)) | ||
Chain( | ||
Dense(3 => 2, σ), # 8 parameters | ||
BatchNorm(2, affine=false), | ||
) | ||
|
||
julia> img = [28, 28]; | ||
|
||
julia> @autosize (img..., 1, 32) Chain( # size is only needed at runtime | ||
Chain(c = Conv((3,3), _ => 5; stride=2, pad=SamePad()), | ||
p = MeanPool((3,3)), | ||
b = BatchNorm(_), | ||
f = Flux.flatten), | ||
Dense(_ => _÷4, relu, init=Flux.rand32), # can calculate output size _÷4 | ||
SkipConnection(Dense(_ => _, relu), +), | ||
Dense(_ => 10), | ||
) |> gpu # moves to GPU after initialisation | ||
Chain( | ||
Chain( | ||
c = Conv((3, 3), 1 => 5, pad=1, stride=2), # 50 parameters | ||
p = MeanPool((3, 3)), | ||
b = BatchNorm(5), # 10 parameters, plus 10 | ||
f = Flux.flatten, | ||
), | ||
Dense(80 => 20, relu), # 1_620 parameters | ||
SkipConnection( | ||
Dense(20 => 20, relu), # 420 parameters | ||
+, | ||
), | ||
Dense(20 => 10), # 210 parameters | ||
) # Total: 10 trainable arrays, 2_310 parameters, | ||
# plus 2 non-trainable, 10 parameters, summarysize 10.469 KiB. | ||
|
||
julia> outputsize(ans, (28, 28, 1, 32)) | ||
(10, 32) | ||
``` | ||
|
||
Limitations: | ||
* While `@autosize (5, 32) Flux.Bilinear(_ => 7)` is OK, something like `Bilinear((_, _) => 7)` will fail. | ||
* While `Scale(_)` and `LayerNorm(_)` are fine (and use the first dimension), `Scale(_,_)` and `LayerNorm(_,_)` | ||
will fail if `size(x,1) != size(x,2)`. | ||
* RNNs won't work: `@autosize (7, 11) LSTM(_ => 5)` fails, because `outputsize(RNN(3=>7), (3,))` also fails, a known issue. | ||
""" | ||
macro autosize(size, model) | ||
Meta.isexpr(size, :tuple) || error("@autosize's first argument must be a tuple, the size of the input") | ||
Meta.isexpr(model, :call) || error("@autosize's second argument must be something like Chain(layers...)") | ||
ex = _makelazy(model) | ||
@gensym m | ||
quote | ||
$m = $ex | ||
$outputsize($m, $size) | ||
$striplazy($m) | ||
end |> esc | ||
end | ||
|
||
function _makelazy(ex::Expr) | ||
n = _underscoredepth(ex) | ||
n == 0 && return ex | ||
n == 1 && error("@autosize doesn't expect an underscore here: $ex") | ||
n == 2 && return :($LazyLayer($(string(ex)), $(_makefun(ex)), nothing)) | ||
n > 2 && return Expr(ex.head, ex.args[1], map(_makelazy, ex.args[2:end])...) | ||
end | ||
_makelazy(x) = x | ||
|
||
function _underscoredepth(ex::Expr) | ||
# Meta.isexpr(ex, :tuple) && :_ in ex.args && return 10 | ||
ex.head in (:call, :kw, :(->), :block) || return 0 | ||
ex.args[1] === :(=>) && ex.args[2] === :_ && return 1 | ||
m = maximum(_underscoredepth, ex.args) | ||
m == 0 ? 0 : m+1 | ||
end | ||
_underscoredepth(ex) = Int(ex === :_) | ||
|
||
function _makefun(ex) | ||
T = Meta.isexpr(ex, :call) ? ex.args[1] : Type | ||
@gensym x s | ||
Expr(:(->), x, Expr(:block, :($s = $autosizefor($T, $x)), _replaceunderscore(ex, s))) | ||
end | ||
|
||
""" | ||
autosizefor(::Type, x) | ||
|
||
If an `_` in your layer's constructor, used within `@autosize`, should | ||
*not* mean the 2nd-last dimension, then you can overload this. | ||
|
||
For instance `autosizefor(::Type{<:Dense}, x::AbstractArray) = size(x, 1)` | ||
is needed to make `@autosize (2,3,4) Dense(_ => 5)` return | ||
`Dense(2 => 5)` rather than `Dense(3 => 5)`. | ||
""" | ||
autosizefor(::Type, x::AbstractArray) = size(x, max(1, ndims(x)-1)) | ||
autosizefor(::Type{<:Dense}, x::AbstractArray) = size(x, 1) | ||
autosizefor(::Type{<:LayerNorm}, x::AbstractArray) = size(x, 1) | ||
|
||
_replaceunderscore(e, s) = e === :_ ? s : e | ||
_replaceunderscore(ex::Expr, s) = Expr(ex.head, map(a -> _replaceunderscore(a, s), ex.args)...) | ||
|
||
mutable struct LazyLayer | ||
str::String | ||
make::Function | ||
layer | ||
end | ||
|
||
@functor LazyLayer | ||
|
||
function (l::LazyLayer)(x::AbstractArray, ys::AbstractArray...) | ||
l.layer === nothing || return l.layer(x, ys...) | ||
made = l.make(x) # for something like `Bilinear((_,__) => 7)`, perhaps need `make(xy...)`, later. | ||
y = made(x, ys...) | ||
l.layer = made # mutate after we know that call worked | ||
return y | ||
end | ||
|
||
function striplazy(m) | ||
fs, re = functor(m) | ||
re(map(striplazy, fs)) | ||
end | ||
function striplazy(l::LazyLayer) | ||
l.layer === nothing || return l.layer | ||
error("LazyLayer should be initialised, e.g. by outputsize(model, size), before using stiplazy") | ||
end | ||
|
||
# Could make LazyLayer usable outside of @autosize, for instance allow Chain(@lazy Dense(_ => 2))? | ||
# But then it will survive to produce weird structural gradients etc. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we force users to call There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I suppose the other policy would just be to allow these things to survive in the model. As long as you never change it, and don't care about the cost of the But any use outside of |
||
|
||
function ChainRulesCore.rrule(l::LazyLayer, x) | ||
l(x), _ -> error("LazyLayer should never be used within a gradient. Call striplazy(model) first to remove all.") | ||
end | ||
function ChainRulesCore.rrule(::typeof(striplazy), m) | ||
striplazy(m), _ -> error("striplazy should never be used within a gradient") | ||
end | ||
|
||
params!(p::Params, x::LazyLayer, seen = IdSet()) = error("LazyLayer should never be used within params(m). Call striplazy(m) first.") | ||
mcabbott marked this conversation as resolved.
Show resolved
Hide resolved
|
||
function Base.show(io::IO, l::LazyLayer) | ||
printstyled(io, "LazyLayer(", color=:light_black) | ||
if l.layer == nothing | ||
printstyled(io, l.str, color=:magenta) | ||
else | ||
printstyled(io, l.layer, color=:cyan) | ||
end | ||
printstyled(io, ")", color=:light_black) | ||
end | ||
|
||
_big_show(io::IO, l::LazyLayer, indent::Int=0, name=nothing) = _layer_show(io, l, indent, name) |
Uh oh!
There was an error while loading. Please reload this page.