Open
Description
Here it concludes that there are non-trainable arrays, but in fact the same weights appear in two layers, which throws off the counting:
julia> using Flux
julia> let d = Dense(10 => 10)
Chain(Embedding(10=>10), d, d)
end
Chain(
Embedding(10 => 10), # 100 parameters
Dense(10 => 10), # 110 parameters
Dense(10 => 10), # 110 parameters
) # Total: 3 trainable arrays, 210 parameters,
# plus 2 non-trainable, 110 parameters, summarysize 1.055 KiB.
julia> Flux.destructure(ans) # length 210, correct
(Float32[1.7007293, 0.66258854, -0.040887665, -1.2084905, -0.53106576 … 0.0, 0.0, 0.0, 0.0, 0.0], Restructure(Chain, ..., 210))
Metadata
Metadata
Assignees
Labels
No labels