-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rm Flux.Zeros #1882
rm Flux.Zeros #1882
Conversation
@testset "$b1 to $b2" for (b1, b2, be) in ( | ||
(Flux.zeros32, Flux.ones32, Flux.ones32), # Load ones as bias to a model with zeros as bias -> model gets ones as bias | ||
(Flux.ones32, nobias, Flux.zeros32), # Load Zeros as bias to a model with ones as bias-> model gets zeros as bias | ||
(nobias, Flux.ones32, nobias), # Load ones as bias to a model with Zeros as bias-> model bias does not change | ||
) | ||
m1 = dm(b1) | ||
m2 = dm(b2) | ||
loadparams!(m1, b1 == nobias ? weights(m2) : pararray(m2)) | ||
testdense(m1, be) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the tests where I'm not quite sure what's being tested or what the intent is. Does anyone know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the intent is if the model already has the bias
as false
(Zeros
), then loading a numeric bias into it should be ignored. The opposite direction is not true, loading a false
(Zeros
) into a numeric bias should zero-out the entries of the numeric bias without disabling it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are some special cases that could easily be handled in #1875.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will fiddle a bit. But possibly test these facts in a less awful way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8acad3e adds some tests.
loading a false (Zeros) into a numeric bias should zero-out the entries of the numeric bias without disabling it.
This does not happen on tagged flux, nor with this PR. I've marked it broken.
(The old tests have a comment "Load Zeros as bias to a model with ones as bias-> model gets zeros as bias" but, apparently, they don't actually test that. Which I take as evidence that nobody else can tell what on earth is being tested, or not tested, in these deeply nested special-cased loops. So we should replace them.)
if the model already has the bias as false (Zeros), then loading a numeric bias into it should be ignored.
This is the current behaviour, and is unchanged here. But there is some chance it ought to be an error, at least when the source vector isn't zero.
As you say, these can be adjusted as desired with structural loadparams!
alla #1875.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to make some amends for awful code :)
The tests are indeed testing some edge cases which one might want to error, .e.g loading params from a model without bias into one which has. The behaviour of loadparams!
was a hot topic in the previous iteration of no bias (although I feel it might have been a proxy for some other more ideological issue) and I just tried to test what the code kind of tried to do.
I think I was a bit worried that testing with just a dense layer as the model would mask alot of errors just because the bias happened to be the last parameter in the list of parameters (and therefore easily ignored by zip
). Using a Chain
with multiple dense makes the tests a bit more verbose but I fully admit that I went overboard with helper functions (and "succinct" naming) here.
Furthermore, the test in 8acad3e which is marked as broken fails only because Params does not collect the bias, so I guess one can just replace it with e.g. [m0.weights, m0.bias]
.
I suppose this is all pretty moot if #1875 will anways change the behaviour of loadparams!
(hopefully to make it more fail-fast).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, well thanks for having a thick skin! Somehow I thought these tests were from the beginning of time... they certainly matched the surrounding code style perfectly.
I've changed the test to use a Chain. Now all the mismatched cases are errors, before and after the PR.
Because while Params
claims to be an IdSet of parameters, in fact loadparams!
relies on the order in vector of parameters. I have no idea whether that's a bug or a feature, the whole thing is hopelessly under-specified. Nevertheless, Flux.params
is clearly the public interface here, and using that, mismatched models in loadparams!
gave an error before this PR, and still give an error after it.
You can make it do other things by constructing this internal vector of parameters yourself. You can put numbers in there:
julia> false in Params([[1 2; 3 4], false])
true
which was what #1731 wanted to make an error, since they don't fit with the IdSet picture at all. No doubt you can construct vectors which make loadparams!
do almost anything, or fail at anything. Are we obliged to preserve any of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they certainly matched the surrounding code style perfectly.
Nice save 🙂
You can make it do other things by constructing this internal vector of parameters yourself.
Yeah, this is probably the key point w.r.t this issue. While one can put false
in there, it has no more relevance to "no bias" than putting 13.44
and whether or not this is something that should work in loadparams!
does not belong here (don't have a strong opinion fwiw).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we do this? Maybe it's time.
Codecov Report
@@ Coverage Diff @@
## master #1882 +/- ##
==========================================
+ Coverage 85.91% 86.04% +0.13%
==========================================
Files 19 18 -1
Lines 1434 1419 -15
==========================================
- Hits 1232 1221 -11
+ Misses 202 198 -4
Continue to review full report at Codecov.
|
f2f3c89
to
02f6594
Compare
src/utils.jl
Outdated
|
||
Return a bias parameter for a layer, based on the value given | ||
to the constructor's keyword `bias=bias`. | ||
|
||
* `bias == true` creates a zero vector, of the same type as weights. | ||
* `bias == false` returns `Zeros()`, a special struct which exists only to encode the absence of bias. | ||
* `bias == false` returns `false` now, which is understood by AD to be non-differentiable. | ||
* `bias::AbstractArray` uses the array provided, provided it has the correct size and eltype. If the type is wrong, it will be converted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is a lie, it does not at present fix eltypes. I think I'd prefer it to insist they match, is there a use for differing types which isn't just a mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I edited the words to match the code.
@test sum(m1[1].weight) == 6 # written before error | ||
|
||
# load into a model without bias -- should it ignore the parameter which has no home, or error? | ||
@test_broken Flux.loadparams!(m0, Flux.params(m2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think being strict and erroring is fine. Also not worth thinking too much about this now as we phase out Params
.
Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think it's better to deal with the loadparams!
separately in #1875.
This is the same change as #1407, but easier to replace than to rebase.
This year's motivation for killing Flux.Zeros is that its broadcasting rules are broken for what's needed in #1881. Last year, and the one before... well, there's a long history. The extent of this thing was reduced in #1379.
Closes #1407. Fixes #1392. Fixes #1402. Does not re-break #1332.