Skip to content

Conversation

@scimas
Copy link
Contributor

@scimas scimas commented Aug 2, 2020

29832ac broke ADADelta, simply reversing its change and adding comments to not move around the epsilons. Fixes #1158

The testset probably needs more work than specifically for this case, so I'm not adding it here.

PR Checklist

  • Tests are added fixed
  • Entry in NEWS.md
  • Documentation, if applicable
  • Final review from @dhairyagandhi96 (for API changes).

29832ac broke ADADelta, simply
reversing its change and adding comments to not move around the
epsilons.
@CarloLucibello
Copy link
Member

thanks! can we fix tests so that we would have catched this?

@scimas
Copy link
Contributor Author

scimas commented Aug 3, 2020

I kind of found why the test isn't working, but I have no idea how it is happening.

@testset "Optimise" begin
w = randn(10, 10)
@testset for opt in [ADAMW(), ADAGrad(0.1), AdaMax(), ADADelta(0.9), AMSGrad(),
NADAM(), RADAM(), Descent(0.1), ADAM(), OADAM(), Nesterov(), RMSProp(),
Momentum()]
w′ = randn(10, 10)
loss(x) = Flux.Losses.mse(w*x, w′*x)
for t = 1: 10^5

If you add an @info loss(rand(10)) right after the loss function definition and before the for loop, it prints a loss of 0.0. Which means it hasn't been testing anything for who knows how long.

If you remove the @testset from @testset for opt in [...] it works correctly and reports non zero loss and that ADADelta is failing. I haven't checked other optimisers yet. Also, I don't know whether removing the @testset is the right way to go forward.

@scimas
Copy link
Contributor Author

scimas commented Aug 4, 2020

Alright, figured it out. Every @testset execution resets the global RNG's state. Which means w == w′ at all times. I will add an explicit seed set statement and see whether the tests pass.

The `@testset` macro resets the global RNG's state after every body
execution. Which means that the true parameters `w` and `w'` were always
getting the same values, meaning there was nothing to optimise and the
tests would always pass. Fixed that by explicitly setting different
seeds.
@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Aug 4, 2020

Locally did you verify/ cross check if there are other optimisers affected by this?

@scimas
Copy link
Contributor Author

scimas commented Aug 4, 2020

Are you talking about the tests not doing anything or any other optimisers being affected by the epsilon like changes?

The tests should be fixed, I think. All of the optimisers are being tested in the same @testset, so the fix should apply to all of them.

I haven't checked other optimisers for algorithmic correctness. That would be next.

@scimas
Copy link
Contributor Author

scimas commented Aug 4, 2020

Looks like OADAM might be broken, the test failed for only that one.

@scimas
Copy link
Contributor Author

scimas commented Aug 4, 2020

Calculations of OADAM look fine to me. It does seem to have reduced the loss to a certain extent, just not as much as the @test is expecting (OADAM is taking it to ~0.025, test expects 0.01). Even giving it 10 times more iterations doesn't seem to show much improvement. Maybe OADAM is just not good at "optimising" random numbers (wild guess, not an optimisation theory expert)?

Perhaps changing the test to just check that the loss has decreased would be a good idea (as opposed to checking it has reduced below a hardcoded value of 0.01). Please provide other suggestions too.

Also should that be a separate PR? This isn't specific to fixing ADADelta anymore, it is already fixed.

@DhairyaLGandhi
Copy link
Member

It might mean that the default parameters of the optimiser aren't ideal.

@scimas
Copy link
Contributor Author

scimas commented Aug 4, 2020

That is a good possibility. The paper itself does not suggest any default parameters, only tells which ones they used when conducting their experiments. On top of that, neither PyTorch nor Tensorflow have implemented OADAM, so that avenue of finding established defaults is closed too.

@scimas
Copy link
Contributor Author

scimas commented Aug 5, 2020

Increasing the learning rate of OADAM by a factor of 10 is enough to make it pass the test. Should we change the default parameter value or only increase in the test?

@DhairyaLGandhi
Copy link
Member

The default one would be best, thanks!

To make it pass the Optimise testset.
@scimas scimas changed the title Fix ADADelta doing nothing Fix ADADelta calculations and broken tests not catching the problems Aug 5, 2020
@scimas
Copy link
Contributor Author

scimas commented Aug 5, 2020

Ready for review

@DhairyaLGandhi
Copy link
Member

Thanks! Lgtm

bors r+

@bors
Copy link
Contributor

bors bot commented Aug 6, 2020

Build succeeded:

@bors bors bot merged commit 2ea0d35 into FluxML:master Aug 6, 2020
@scimas scimas deleted the adadeltafix branch August 6, 2020 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ADADelta not training parameters

3 participants