-
-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complex numbers alla Flux 1776 #47
Conversation
72fe5e3
to
5edf5a6
Compare
284714f
to
309e606
Compare
What's the loss function you are using in the plot? |
It's the one from the tests, here https://github.com/FluxML/Optimisers.jl/pull/47/files#diff-9f183063573ccf8f01dd29629bad9986f3faa3ce2fdeeb3f0855eaa1f6450715R214 and the plot commands are in comments there. The same loss is used in the Flux PR's tests. |
# The Flux PR had 1e-2 for all. But ADADelta(ρ) needs ρ≈0.9 not small. And it helps to make ε not too small too: | ||
ADAM(1e-2), RMSProp(1e-2), RADAM(1e-2), OADAM(1e-2), ADAGrad(1e-2), ADADelta(0.9, 1e-5), NADAM(1e-2), AdaBelief(1e-2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the fix, BTW.
Good to go? |
test/rules.jl
Outdated
LOG = Dict() | ||
|
||
loggradient(o) = (f, xs...) -> begin | ||
y, dxs = Zygote.withgradient(f, xs...) | ||
push!(get!(() -> Float32[], LOG, name(o)), y) | ||
dxs # save the loss, return the gradient | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (LOG
) doesn't seem to be in use anywhere, is it still necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just makes debugging easier if you can plot things form the tests you just ran. It's not strictly necessary but also doesn't really get in the way, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. A comment with what you just described would help then.
This copies FluxML/Flux.jl#1776
but doesn't work. Why?Xref #38
Edit:
Now this is on top of #48.The original tests pass, but ADADelta only just passes, only the last few digits of the loss decrease.Not sure ifThat's true in Flux too.Might be a bug somewhere?Because the parameters chosen are strange ones.