-
-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add map/broadcast/algebra/iteration/dict interface for Grads #902
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for taking care of this! My comment is that since Grads
behaves like a Dict
, I think we should limit this to operations involving only Grads
(where we can verify the key set equality completely).
I'm trying to figure out how to handle |
I decided to materialize |
ok, I think I'm happy with the design and implementation, should be ready |
There is an argument to be made that Broadcast just seeing value would make sense to me. But also access to keys is useful. scaled_g = map(g) do k, v
if is_gaussian_layer(k)
0.01*v
else
v
end
end This wouldn't quite match what @darsnack proposed here FluxML/Flux.jl#707 (comment) Although: |
Consider that keys are parameters, not layers, so I don't think you can detect if The use cases I can think of, such as clamping, make only use of values, so value-based map and iteration seem more convenient, and in any case I'm not strongly against pairs iteration and map though, just seems less convenient to me but I'm sure it would work just fine. Maybe @andyferris could give some advice here |
Ok, I am sold, I trust you have thought this through. |
I feel like it would be surprising if The main issue for me though is that the ordering of the pairs is not guaranteed across One thing I will suggest is that maybe we extend the current binary operation broadcast between a number and a |
One other option is to allow broadcasting not just with other |
This a bit tricky, because
handling Edit. Actually, we can extend support to AbstractDict |
My bad, that was a poor example. I just meant that if broadcasted(f, a::Number, gs::Grads) = map(x -> f(a, x), gs)
broadcasted(f, gs::Grads, a::Number) = map(x -> f(x, a), gs) it would work. Probably people only care about |
@darsnack should we generalize this to |
@@ -185,7 +188,7 @@ function copy!(x::AbstractVector, gs::Grads) | |||
x | |||
end | |||
|
|||
broadcasted(f, gss::Grads...) = map(f, gss...) | |||
broadcasted(f, gs::Grads, gss::ADictOrGrads...) = map(f, gs, gss...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would want this
broadcasted(f, gs::Grads, gss::ADictOrGrads...) = map(f, gs, gss...) | |
broadcasted(f, gs::ADictOrGrads, gss::ADictOrGrads...) = map(f, gs, gss...) |
But that would overload broadcasting for the case where gs
and gss
are just Dict
s (no Grads
). Don't know a quick way around it but unfortunate that the first arg must be a Grad
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, I don't know either, unless we start adding a bunch of definitions like
broadcasted(f, gs1::AbstractDict, gs2::Grads, gss::ADictOrGrads...) = map(f, gs1, gs2, gss...)
and the corresponding ones for map
, but this seems pretty annoying and I would leave it to future PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is broadcasted(f, gss::Vararg{Union{AbstractDict, Grads}})
different from broadcasted(f, gss::Vararg{<:Union{AbstractDict, Grads}})
? I think the first forces the union type. Let me double check.
It does not 😞. Yeah I think leave this for another PR unless someone knows a quick fix.
We could. It seems unlikely that every parameter in the |
actually, this case is problematic, since if |
I totally agree and just forgot to write the |
In the end, I allowed broadcasted binary operations involved numerical types. I think we are done. One little regret is that we now support gs .+ Dict(w => rand(2), b => nothing) but it would nice to relax the gs .+ Dict(w => rand(2)) Doesn't seem easy to do with current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really great! Thanks for the effort.
Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com>
Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com>
bors r+ |
Build succeeded: |
I’m not sure I can comment specifically on I particularly agree with:
This is a really good point. The other nice one under default As to differences between |
Fix FluxML/Flux.jl#707