Skip to content

Commit

Permalink
Merge #1150
Browse files Browse the repository at this point in the history
1150: generalize and homogenize losses r=CarloLucibello a=CarloLucibello

In order to enforce some consistency in the loss interface, this PR does the following:

- adds to every loss an `agg` keyword, which defaults to the function `mean` . This defines the aggregation type (typically `mean` or `sum`). One can use `identity` for no aggregation. 
- add a `dims` keyword when meaningful. 
- fix other little inconsistencies among the losses

For instance, the crossentropy definition becomes
```julia
function crossentropy(ŷ, y; dims=1, agg=mean, ϵ=eps(eltype(ŷ)))
    agg(.-sum(y .* log.(ŷ .+ ϵ); dims=dims))
end
```


Co-authored-by: CarloLucibello <carlo.lucibello@gmail.com>
  • Loading branch information
bors[bot] and CarloLucibello authored Jul 1, 2020
2 parents 5d93bc7 + b81552a commit 822f13c
Show file tree
Hide file tree
Showing 15 changed files with 231 additions and 234 deletions.
23 changes: 15 additions & 8 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# v0.11
* Add [kaiming initialization](https://arxiv.org/abs/1502.01852) methods: `kaiming_uniform` and `kaiming_normal` [https://github.com/FluxML/Flux.jl/pull/1243]
* Change to `DataLoader`'s constructor [https://github.com/FluxML/Flux.jl/pull/1152]
* Use `DataLoader` with `NamedTuple`s, so that tensors can be accessed by name [https://github.com/FluxML/Flux.jl/pull/1221].
* Error if Dense layers weights and biases are not arrays [https://github.com/FluxML/Flux.jl/pull/1218].
* Add `Adaptive Pooling` in Flux layers [https://github.com/FluxML/Flux.jl/pull/1239].
* Optimistic ADAM (OADAM) optimizer for adversarial training [https://github.com/FluxML/Flux.jl/pull/1246].

# v0.10.5
* Add [kaiming initialization](https://arxiv.org/abs/1502.01852) methods: [kaiming_uniform and kaiming_normal](https://github.com/FluxML/Flux.jl/pull/1243)
* Use `DataLoader` with `NamedTuple`s, so that tensors can be accessed [by name](https://github.com/FluxML/Flux.jl/pull/1221).
* Error if Dense layers weights and biases are [not arrays](https://github.com/FluxML/Flux.jl/pull/1218).
* Add (Adaptive Pooling)[https://github.com/FluxML/Flux.jl/pull/1239] in Flux layers.
* Change to `DataLoader`'s [constructor](https://github.com/FluxML/Flux.jl/pull/1152)
* Uniform loss [interface](https://github.com/FluxML/Flux.jl/pull/1150)
* Optimistic ADAM (OADAM) optimizer for [adversarial training](https://github.com/FluxML/Flux.jl/pull/1246).
* Add option for [same padding](https://github.com/FluxML/Flux.jl/pull/901) to conv and pooling layers by setting `pad=SamePad()`.
* Added option to set `bias` to [Flux.Zeros](https://github.com/FluxML/Flux.jl/pull/873) to eliminating `bias` from being trained.
* Added `GlobalMaxPool` and `GlobalMeanPool` [layers](https://github.com/FluxML/Flux.jl/pull/950) for performing global pooling operations.
Expand All @@ -16,21 +16,28 @@
* Testing suite improvements now test for gradients of all layers along with GPU support.
* Functors have now moved to [Functors.jl](https://github.com/FluxML/Flux.jl/pull/1174) to allow for their use outside of Flux.
* Added [helper functions](https://github.com/FluxML/Flux.jl/pull/873) `Flux.convfilter` and `Flux.depthwiseconvfilter` to construct weight arrays for convolutions outside of layer constructors so as to not have to depend on the default layers for custom implementations.
* and many more fixes and additions...

# v0.10.1 - v0.10.4

See GitHub's releases.

# v0.10.0

* The default AD engine has switched from [Tracker to Zygote.jl](https://github.com/FluxML/Flux.jl/pull/669)
- The dependency on Tracker.jl has been removed.
- This means Flux now does not depend on using a specialised `TrackedArray` type, and can be used with normal Array implementations directly.
- Tracker compatibility is maintained in most common cases, but Zygote will be the preferred AD backend for Flux from now on.
* The CUDNN wrappers have been [moved from Flux into CuArrays](https://github.com/FluxML/Flux.jl/pull/874), to allow for better supporting the CUDA backend, and improve user experience, not to mention making Flux lean.
* `*crossentropy` functions now [work as expected with CuArrays](https://github.com/FluxML/Flux.jl/pull/926). [PR for binarycrossentropy](https://github.com/FluxML/Flux.jl/pull/940).
* `*crossentropy` functions now [work as expected with CuArrays](https://github.com/FluxML/Flux.jl/pull/926). [PR for bce_loss](https://github.com/FluxML/Flux.jl/pull/940).
* Added [clearer docs](https://github.com/FluxML/Flux.jl/pull/904) around training and the Optimiser interface.
* [Layer initialisations](https://github.com/FluxML/Flux.jl/pull/937) have been improved with a clearer API on how to extend it for other purposes.
* [Better messaging around CUDA availability](https://github.com/FluxML/Flux.jl/pull/924), with hooks to initialize the GPU as default where possible.
* `@treelike` has been formalised as a [functor](https://github.com/FluxML/Flux.jl/pull/865), with an effective deprecation.
* `testmode!` is deprecated in favour of [istraining](https://github.com/FluxML/Flux.jl/pull/669)

# v0.9.0

* [Depthwise convolutional layer API changes](https://github.com/FluxML/Flux.jl/pull/756) from `in => mult` channel specification to `in => out` channel specification, and deprecates implicit `out` constructor.
* New [SkipConnection](https://github.com/FluxML/Flux.jl/pull/446), which can be used to train residual neural network architectures.
* New [RADAM](https://github.com/FluxML/Flux.jl/pull/842) optimiser.
Expand Down
3 changes: 2 additions & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@ makedocs(modules=[Flux, NNlib],
"Building Models" =>
["Basics" => "models/basics.md",
"Recurrence" => "models/recurrence.md",
"Regularisation" => "models/regularisation.md",
"Model Reference" => "models/layers.md",
"Loss Functions" => "models/losses.md",
"Regularisation" => "models/regularisation.md",
"Advanced Model Building" => "models/advanced.md",
"NNlib" => "models/nnlib.md"],
"Handling Data" =>
Expand Down
20 changes: 1 addition & 19 deletions docs/src/models/layers.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,22 +73,4 @@ Many normalisation layers behave differently under training and inference (testi
```@docs
Flux.testmode!
trainmode!
```

## Cost Functions
```@docs
Flux.mae
Flux.mse
Flux.msle
Flux.huber_loss
Flux.crossentropy
Flux.logitcrossentropy
Flux.binarycrossentropy
Flux.logitbinarycrossentropy
Flux.kldivergence
Flux.poisson
Flux.hinge
Flux.squared_hinge
Flux.dice_coeff_loss
Flux.tversky_loss
```
```
40 changes: 40 additions & 0 deletions docs/src/models/losses.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
## Loss Functions

Flux provides a large number of common loss functions used for training machine learning models.

Loss functions for supervised learning typically expect as inputs a target `y`, and a prediction ``.
In Flux's convention, the order of the arguments is the following

```julia
loss(ŷ, y)
```

Most loss functions in Flux have an optional argument `agg`, denoting the type of aggregation performed over the
batch:

```julia
loss(ŷ, y) # defaults to `mean`
loss(ŷ, y, agg=sum) # use `sum` for reduction
loss(ŷ, y, agg=x->sum(x, dims=2)) # partial reduction
loss(ŷ, y, agg=x->mean(w .* x)) # weighted mean
loss(ŷ, y, agg=identity) # no aggregation.
```

### Losses Reference

```@docs
Flux.mae
Flux.mse
Flux.msle
Flux.huber_loss
Flux.crossentropy
Flux.logitcrossentropy
Flux.bce_loss
Flux.logitbce_loss
Flux.kldivergence
Flux.poisson_loss
Flux.hinge_loss
Flux.squared_hinge_loss
Flux.dice_coeff_loss
Flux.tversky_loss
```
17 changes: 9 additions & 8 deletions docs/src/models/regularisation.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,10 @@ add the result to the overall loss.
For example, say we have a simple regression.

```julia
using Flux: crossentropy
using Flux
using Flux: logitcrossentropy
m = Dense(10, 5)
loss(x, y) = crossentropy(softmax(m(x)), y)
loss(x, y) = logitcrossentropy(m(x), y)
```

We can regularise this by taking the (L2) norm of the parameters, `m.W` and `m.b`.
Expand All @@ -18,19 +19,19 @@ We can regularise this by taking the (L2) norm of the parameters, `m.W` and `m.b
using LinearAlgebra

penalty() = norm(m.W) + norm(m.b)
loss(x, y) = crossentropy(softmax(m(x)), y) + penalty()
loss(x, y) = logitcrossentropy(m(x), y) + penalty()
```

When working with layers, Flux provides the `params` function to grab all
parameters at once. We can easily penalise everything with `sum(norm, params)`.
parameters at once. We can easily penalise everything with `sum`:

```julia
julia> params(m)
julia> Flux.params(m)
2-element Array{Any,1}:
param([0.355408 0.533092; 0.430459 0.171498])
param([0.0, 0.0, 0.0, 0.0, 0.0])

julia> sum(norm, params(m))
julia> sum(norm, Flux.params(m))
26.01749952921026
```

Expand All @@ -40,9 +41,9 @@ Here's a larger example with a multi-layer perceptron.
m = Chain(
Dense(28^2, 128, relu),
Dense(128, 32, relu),
Dense(32, 10), softmax)
Dense(32, 10))

loss(x, y) = crossentropy(m(x), y) + sum(norm, params(m))
loss(x, y) = logitcrossentropy(m(x), y) + sum(norm, Flux.params(m))

loss(rand(28^2), rand(10))
```
Expand Down
1 change: 1 addition & 0 deletions src/Flux.jl
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ include("onehot.jl")
include("functor.jl")

include("layers/stateless.jl")
include("layers/losses.jl")
include("layers/basic.jl")
include("layers/conv.jl")
include("layers/recurrent.jl")
Expand Down
9 changes: 7 additions & 2 deletions src/deprecations.jl
Original file line number Diff line number Diff line change
@@ -1,2 +1,7 @@
@deprecate param(x) x
@deprecate data(x) x
# v0.11 deprecations
@deprecate poisson poisson_loss
@deprecate hinge hinge_loss
@deprecate squared_hinge squared_hinge_loss
@deprecate binarycrossentropy(ŷ, y) bce_loss(ŷ, y, agg=identity)
@deprecate logitbinarycrossentropy(ŷ, y) logitbce_loss(ŷ, y, agg=identity)
@deprecate normalise(x) normalise(x, dims=1)
2 changes: 1 addition & 1 deletion src/layers/basic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ Dense(5, 2)
julia> d(rand(5))
2-element Array{Float32,1}:
-0.16210233
0.12311903
0.123119034
```
"""
struct Dense{F,S<:AbstractArray,T<:AbstractArray}
Expand Down
Empty file added src/layers/losses.jl
Empty file.
2 changes: 1 addition & 1 deletion src/layers/normalise.jl
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ LayerNorm(h::Integer) =

@functor LayerNorm

(a::LayerNorm)(x) = a.diag(normalise(x))
(a::LayerNorm)(x) = a.diag(normalise(x, dims=1))

function Base.show(io::IO, l::LayerNorm)
print(io, "LayerNorm(", length(l.diag.α), ")")
Expand Down
Loading

0 comments on commit 822f13c

Please sign in to comment.