Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conv layers have trainable stride? #1293

Open
ianfiske opened this issue Jul 24, 2020 · 6 comments
Open

Conv layers have trainable stride? #1293

ianfiske opened this issue Jul 24, 2020 · 6 comments

Comments

@ianfiske
Copy link

ianfiske commented Jul 24, 2020

I'm looking at existing layers to gain an understanding as I build my own. This is confusing me:

julia> Flux.trainable(Conv((2,2), 1=>1))
(σ = identity, weight = Float32[-0.39410466 -0.59730196; 0.8242678 0.7421374], bias = Float32[0.0], stride = (1, 1), pad = (0, 0, 0, 0), dilation = (1, 1))

I wouldn't expect stride, pad, dilation, and sigma to be "trainable". Am I missing something or is this a bug?

@pxl-th
Copy link
Member

pxl-th commented Jul 25, 2020

Because trainable parameters for Conv were defined via @functor, every field of Conv ends up in trainable parameters.
But they have zero gradient during training, so they are not trainable. If I understand it correctly.

@CarloLucibello
Copy link
Member

Since the trainable method is not defined for Conv, it falls back to generic method

trainable(m) = functor(m)[1]

which in turn returns every field of Conv due to the line

@functor Conv

Notice though that only AbstractArray objects end up in the parameters, so you have

julia> params(Conv((2,2), 1=>1))
Params([Float32[0.12754345 -0.69176644; 0.1930983 -0.04415129], Float32[0.0]])

as expected

@CarloLucibello
Copy link
Member

Although it is true that trainable is only used internally and not exposed to users, it would also be nice to avoid the surprising behavior reported here, so we should be less lazy and explicitly define trainable for each layer in Flux

@atiyo
Copy link
Contributor

atiyo commented Jan 4, 2021

I am happy to pick this up if it would help.

I could add in a specialisedtrainable for each layer.

But I thought it might be cleaner to change the generic method of trainable to only include AbstractArrays. This would avoid having to constantly write specialised methods as new layers are added.

@ToucheSir
Copy link
Member

ToucheSir commented Jan 4, 2021

IIRC trainable can't just return AbstractArray fields because it needs to return network submodules as well. The easier solution is to exclude fields like stride from being functored. This should be possible once FluxML/Functors.jl#7 lands.

@DhairyaLGandhi
Copy link
Member

Yeah, we don't need to do anything special here yet.

Maybe set

@functor Conv (weight, bias) to be extra clear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants