Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Layer normalisation does not work for images #406

Closed
skariel opened this issue Sep 20, 2018 · 5 comments
Closed

Layer normalisation does not work for images #406

skariel opened this issue Sep 20, 2018 · 5 comments

Comments

@skariel
Copy link
Contributor

skariel commented Sep 20, 2018

The layer uses the normalise (stateless) function as defined here. This function calculates mean and std on dims=1 but for images we need dims=(1,2,3) leaving out only the batch dimension. The following function should work:

function normalise(x)
    d = ndims(x) == 4 ? [1,2,3] : [1,]
    mu = Flux.mean(x, dims=d)
    sd = Flux.std(x, dims=d, mean=mu)
    return (x.-mu)./sd
end

also the type of x has to change in the function signature to allow for images, currently x::AbstractVecOrMat fails for e.g. rand(Float32, 84,84,1,1) since it allows only 1d or 2d arrays.

@johnnychen94
Copy link
Contributor

johnnychen94 commented Oct 7, 2018

I think the Julia way of using images is not to represents images as 4-D array Array{Float64,4} as we usually do in Python or Matlab, but instead as Array of images, e.g., Array{Array{Gray,2},1}

For example,

julia> Flux.Data.MNIST.images() |> summary
"60000-element Array{Array{ColorTypes.Gray{FixedPointNumbers.Normed{UInt8,8}},2},1}"

@skariel
Copy link
Contributor Author

skariel commented Oct 8, 2018

and how do you do channels in general, sometimes you have channels that don't represent images. Say I want to feed a stack of 4 images for Atari reinforcement learning using RGB so I would need 12 channels, how do you do it?

it could be Array{Float32, 4} of size (84,84,12,batchsize) or... ?

@MikeInnes
Copy link
Member

We do actually end up using 4D arrays for this since it what the convolutions take (and the format is documented more there).

I suggest we just make dims an input keyword argument, with 1 being the default.

@johnnychen94
Copy link
Contributor

@skariel Yes you're right and let me withdraw what I said...

We do use Array{Gray,2} to represent an image and feed it into some filters, but to make it clear, in deep learning network we generally treat everything as a 4-D array and do not know what exactly a channel represents in the middle layers, and hope the network knows the meaning of data behind it.

So yes in a general deep learning framework, I think it might be more intuitive to convert the input images Array{Array{Gray,2},1} to Array{Float32,4} (note I haven't done any real research project with Flux by now, so it's just a very personal idea)

@skariel
Copy link
Contributor Author

skariel commented Oct 9, 2018

@MikeInnes sounds good, also the type would have to change from AbstractVecOrMat to AbstractArray

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants