Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Norm Layers, Again #1509
Fix Norm Layers, Again #1509
Changes from 14 commits
2cd59b4
0ef033e
1e67700
0dbeb39
04ca0a8
31f7000
262112a
c61457c
0432b82
77a8e87
d0961a7
b091a80
680e64f
4a26559
1c38029
137acf9
db559c5
f5c3641
fcea841
647dcd9
561c94f
332b13b
30d6542
1d99fd6
e3ae11d
c66202e
d6fac56
16d0b96
e7fe00b
9c01dd2
9abfe0c
a621ef6
e174605
99901a7
9f481e4
e9d89ab
8f3844c
bf34b73
14a6372
d82c3d3
8aadf1e
28521c1
19b91b2
0d4605d
36084e5
3c6f1ce
8f6de19
c525d4f
aa39039
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we should have params that we don't use in practice when
affine=false
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other thought on this: I know we rejected
Affine(BatchNorm(...))
because it is not ergonomically nice. But what about having a field inBatchNorm
calledaffine
that is a function. When the affine kwarg is true, thenaffine = Dense(...)
but when it is false,affine = identity
? In aChain
, the user still see justBatchNorm
andbn.affine
will show aDense
which the user will intuitively understand. We also don't end up storing data we don't need, andtrainable
will automatically be empty.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of like you'd have for an activation function? That does sound pretty appealing, would it make sense to call that field
transform
or something then?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah exactly.
transform
would certainly capture the field accurately, butaffine
might be better here cause people will understand it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well that had been a consideration. Its a bit awkward to also do
γ = reshape(l.γ, affine_shape)
but I think we can have broadcasting take care of the shape as needed byl.λ.(γ .* x̂ .+ β)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you remove this μ, σ² will end up in the params