Closed
Description
Not sure if this is the right way to go about it so I'd like to ask what you all think... Would it make sense to make some adjoints for regularizers, and or attach them to specific layers?
ie:
L1(Δ, x, λ) = ( Δ .+ ( λ .* sign.( Δ ) ) ) .* ( abs.( x ) .> λ )
L1hook( x, λ ) = x
@adjoint L1hook( x, λ ) = x, Δ -> (L1( Δ, x, λ ),nothing)
note: this isn't a perfect lasso representation - it's missing a term based on the optimizer learning rate
I think, but its a quick demonstrative hack and works if one is mindful of the magnitude of their independent variables.
L1Dense(...)
σ.(L1hook(W, λ)*x .+ b)
end
For L2 it's not as big of a deal, but maybe there are other cases where baking this type of capability into some layers is worth while? Open to feedback and willing to make a PR if its deemed a reasonable suggestion