You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Data should be stored in WHCN order (width, height, # channels, batch size). In other words, a 100×100 RGB image would be a 100×100×3×1 array, and a batch of 50 would be a 100×100×3×50 array.
# Function to convert the RGB image to Float64 Arraysfunctiongetarray(X)
Float32.(permutedims(channelview(X), (2, 3, 1)))
end
The correct transform should be Float32.(permutedims(channelview(X), (3, 2, 1))), because channelview(X) returns a "CHW" array. Likewise, MNIST example doesn't use any permutedims so it just keeps its wrong "HW" order. Fortunately, some are correct, for example, this one from MLDatasets. The three cases are shown in this gist.
So basically, many examples in fact run models on a "transposed" image datasets. The good (bad?) part is that CNN is robust enough to deal with this distortion, so we can't detect it from statistics such as acc and even eye evaluation. But those examples suggest a misleading preprocessing pipeline and should be fixed. (To be honest, I post this issue since I have been misleaded by this...)
The text was updated successfully, but these errors were encountered:
I think FluxML shall also document why we use the WHCN order. The only explanation I have is that CUDA.jl is using cuDNN and cuDNN supports the NCHW order (row-major). If we look at the NCHW memory representation in cuDNN, we can notice this is exactly the same as the memory representation of WHCN in julia.
According to doc
But many examples use
HWCN
, for example:The correct transform should be
Float32.(permutedims(channelview(X), (3, 2, 1)))
, because channelview(X) returns a "CHW" array. Likewise, MNIST example doesn't use anypermutedims
so it just keeps its wrong "HW" order. Fortunately, some are correct, for example, this one fromMLDatasets
. The three cases are shown in this gist.So basically, many examples in fact run models on a "transposed" image datasets. The good (bad?) part is that CNN is robust enough to deal with this distortion, so we can't detect it from statistics such as acc and even eye evaluation. But those examples suggest a misleading preprocessing pipeline and should be fixed. (To be honest, I post this issue since I have been misleaded by this...)
The text was updated successfully, but these errors were encountered: