Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling for layers with multiple inputs w/ outputsize #1466

Closed
darsnack opened this issue Jan 14, 2021 · 9 comments · Fixed by #1486
Closed

Better handling for layers with multiple inputs w/ outputsize #1466

darsnack opened this issue Jan 14, 2021 · 9 comments · Fixed by #1486

Comments

@darsnack
Copy link
Member

darsnack commented Jan 14, 2021

Given the variable argument version of Parallel (#1462) and #1009, it seems like we need better support for multiple arguments as layer inputs. Currently, outputsize only understands a single Tuple for inputsize.

@mcabbott
Copy link
Member

mcabbott commented Jan 25, 2021

Should be easy to do.

Edit -- The first question here is what's the set of inputs to be allowed. Parallel treats a tuple of arrays the same as multiple arrays, and the Bilinear PR currently does the same. If that's always true, then something like outputsize(m, (10,32), (20, 32)) would be enough. (With padbatch=true applied to all the inputs?) But if not, then that syntax should make two arrays, while outputsize(m, ((10,32), (20, 32))) would make a tuple of two arrays.

Of course Chain only accepts one input. Perhaps that argues for the simpler setup, i.e. just outputsize(m, sizes::Tuple{Varagrg{Int}}...), no tuples of tuples.

Maybe another feature to consider is making outputsize give more informative errors. A simple step would be that, given a Chain, it uses try/catch to step along, and tells you which layer rejected what size input. This would slow it down of course, but only some ns/layer, am I correct in thinking this would not be a concern?

@DhairyaLGandhi
Copy link
Member

Try catch would be bad for AD, so I'd punt on it for the time being.

@mcabbott
Copy link
Member

How would AD be involved?

@DhairyaLGandhi
Copy link
Member

The codegen for the forward pass would have to look at the try-catch, which would slow down AD.

@mcabbott
Copy link
Member

But outputsize takes and returns a tuple of integers, it doesn't call AD at all.

Flux.jl/src/outputsize.jl

Lines 98 to 102 in 8dfe4fa

function outputsize(m, inputsize::Tuple; padbatch=false)
inputsize = padbatch ? (inputsize..., 1) : inputsize
return size(m(fill(nil, inputsize)))
end

Are you suggesting that there's a use case in which people call AD on that, somehow?

@DhairyaLGandhi
Copy link
Member

I was thinking of that as a general construct where this is embedded into a version of Parallel that can generate the output size on the fly to be used for downstream model flow.

Sort of like AutoML, but not quite the same thing

@DhairyaLGandhi
Copy link
Member

I'm imagining it as a utility for automated model building and optimisation of the hyper param selection.

@mcabbott
Copy link
Member

Well, OK, that sounds like a much more ambitious piece of machinery, maybe open an issue? I guess I'm already bending this issue away from its original intent, but

making outputsize give more informative errors

is still about the same function. In which there is no AD.

@darsnack
Copy link
Member Author

Yeah better to leave AD in outputsize for another issue. I was thinking for this issue, it should treat (x, y), (a, b) the same as ((x, y), (a, b)). That's how the layers work, and AFAIK that's required for compatibility with Chain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants