-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed the spectral normalization #115
Conversation
Can you give a simple usage example for this, and/or a general idea of how it should be used? |
Hi Mike,
this should be a regularization technique described in this paper
https://arxiv.org/abs/1705.10941
and it should be a drop-in replacement for the weight decay. The crucial difference to popular weight decay is that it regularizes Lipschitz constant of the final network, which seems to be important for example for training Gans.
https://openreview.net/forum?id=B1QRgziT-¬eId=SJok1XB-f
My latest implementation looks as follows:
```
"""
Spectral norm regularization as proposed in
Spectral Norm Regularization for Improving the Generalizability of Deep Learning, Yuichi Yoshida, Takeru Miyato, 2017
https://arxiv.org/pdf/1705.10941.pdf
"""
function spectral(p::Flux.Optimise.Param, λ::Real)
if ndims(p.x) !=2
return(() -> nothing)
end
n,m = size(p.x)
u = similar(p.x,n)
u .= randn(n)
v = similar(p.x,m)
v .= randn(m)
function ()
u .= p.x * v
v .= (u' * p.x )'
σ = norm(v)/norm(u)
v ./=norm(v)
u ./=norm(u)
p.Δ .+= λ*σ*u*v'
nothing
end
end
function spectralnorm(A,i=1000)
n,m = size(A)
u = similar(A,n)
u .= randn(n)
v = similar(A,m)
v .= randn(m)
for ii in 1:i
v ./=norm(v)
u ./=norm(u)
u .= A * v
v .= (u' * A )'
end
norm(v)/norm(u)
end
SpectralADAM(ps, η = 0.001; β1 = 0.9, β2 = 0.999, ϵ = 1e-08, λ = 0) =
Flux.Optimise.optimiser(ps, p -> Flux.Optimise.adam(p; η = η, β1 = β1, β2 = β2, ϵ = ϵ), p -> spectral(p, λ), p -> Flux.Optimise.descent(p, 1))
#unit test
# A = randn(5)
# A = A + A';
# maximum(abs.(eig(A)[1])) - spectralnorm(A)
```
but I confess that the results I am getting are very weird. I thought it would be good if it is on the part of the Flux at least for the sake of completness.
Best wishes,
Tomas
… -------- Original Message --------
Subject: Re: [FluxML/Flux.jl] Fixed the spectral normalization (#115)
Local Time: 8 December 2017 7:35 PM
UTC Time: 8 December 2017 18:35
From: ***@***.***
To: FluxML/Flux.jl ***@***.***>
pevnak ***@***.***>, Author ***@***.***>
Can you give a simple usage example for this, and/or a general idea of how it should be used?
—
You are receiving this because you authored the thread.
Reply to this email directly, [view it on GitHub](#115 (comment)), or [mute the thread](https://github.com/notifications/unsubscribe-auth/APuSQitPl3C0-VE8MYf0s6TVQn-Vz1ZTks5s-YFygaJpZM4QqGqt).
|
86ae1fa
to
d222700
Compare
027a922
to
5cc6813
Compare
@pevnak are you still interested in pursuing this? I see the authors released a follow-up paper at https://arxiv.org/abs/1802.05957. |
Bump on this @pevnak. If this is too far in the rearview mirror, I'd suggest we open an issue and close this PR. That way it's clear what work is up for grabs. |
this type of normalization doesn't seem to be used in current practice, not worth opening an issue unless someone is interested into it |
No description provided.