Fix ADADelta calculations and broken tests not catching the problems #1299

scimas · 2020-08-02T19:28:19Z

29832ac broke ADADelta, simply reversing its change and adding comments to not move around the epsilons. Fixes #1158

The testset probably needs more work than specifically for this case, so I'm not adding it here.

PR Checklist

Tests are ~~added~~ fixed
Entry in NEWS.md
Documentation, if applicable
Final review from @dhairyagandhi96 (for API changes).

29832ac broke ADADelta, simply reversing its change and adding comments to not move around the epsilons.

CarloLucibello · 2020-08-03T14:17:47Z

thanks! can we fix tests so that we would have catched this?

scimas · 2020-08-03T18:02:15Z

I kind of found why the test isn't working, but I have no idea how it is happening.

Flux.jl/test/optimise.jl

Lines 6 to 13 in 1dc9023

    
           @testset "Optimise" begin 
        
             w = randn(10, 10) 
        
             @testset for opt in [ADAMW(), ADAGrad(0.1), AdaMax(), ADADelta(0.9), AMSGrad(), 
        
                                  NADAM(), RADAM(), Descent(0.1), ADAM(), OADAM(), Nesterov(), RMSProp(), 
        
                                  Momentum()] 
        
               w′ = randn(10, 10) 
        
               loss(x) = Flux.Losses.mse(w*x, w′*x) 
        
               for t = 1: 10^5

If you add an @info loss(rand(10)) right after the loss function definition and before the for loop, it prints a loss of 0.0. Which means it hasn't been testing anything for who knows how long.

If you remove the @testset from @testset for opt in [...] it works correctly and reports non zero loss and that ADADelta is failing. I haven't checked other optimisers yet. Also, I don't know whether removing the @testset is the right way to go forward.

scimas · 2020-08-04T01:58:10Z

Alright, figured it out. Every @testset execution resets the global RNG's state. Which means w == w′ at all times. I will add an explicit seed set statement and see whether the tests pass.

The `@testset` macro resets the global RNG's state after every body execution. Which means that the true parameters `w` and `w'` were always getting the same values, meaning there was nothing to optimise and the tests would always pass. Fixed that by explicitly setting different seeds.

DhairyaLGandhi · 2020-08-04T02:39:51Z

Locally did you verify/ cross check if there are other optimisers affected by this?

scimas · 2020-08-04T02:45:50Z

Are you talking about the tests not doing anything or any other optimisers being affected by the epsilon like changes?

The tests should be fixed, I think. All of the optimisers are being tested in the same @testset, so the fix should apply to all of them.

I haven't checked other optimisers for algorithmic correctness. That would be next.

scimas · 2020-08-04T02:47:01Z

Looks like OADAM might be broken, the test failed for only that one.

scimas · 2020-08-04T03:29:02Z

Calculations of OADAM look fine to me. It does seem to have reduced the loss to a certain extent, just not as much as the @test is expecting (OADAM is taking it to ~0.025, test expects 0.01). Even giving it 10 times more iterations doesn't seem to show much improvement. Maybe OADAM is just not good at "optimising" random numbers (wild guess, not an optimisation theory expert)?

Perhaps changing the test to just check that the loss has decreased would be a good idea (as opposed to checking it has reduced below a hardcoded value of 0.01). Please provide other suggestions too.

Also should that be a separate PR? This isn't specific to fixing ADADelta anymore, it is already fixed.

DhairyaLGandhi · 2020-08-04T03:45:31Z

It might mean that the default parameters of the optimiser aren't ideal.

scimas · 2020-08-04T04:33:25Z

That is a good possibility. The paper itself does not suggest any default parameters, only tells which ones they used when conducting their experiments. On top of that, neither PyTorch nor Tensorflow have implemented OADAM, so that avenue of finding established defaults is closed too.

scimas · 2020-08-05T19:11:01Z

Increasing the learning rate of OADAM by a factor of 10 is enough to make it pass the test. Should we change the default parameter value or only increase in the test?

DhairyaLGandhi · 2020-08-05T19:21:37Z

The default one would be best, thanks!

To make it pass the Optimise testset.

scimas · 2020-08-05T19:48:37Z

Ready for review

DhairyaLGandhi · 2020-08-06T02:02:14Z

Thanks! Lgtm

bors r+

bors · 2020-08-06T02:16:17Z

Build succeeded:

ci/gitlab/gitlab.com

Fix ADADelta doing nothing

22812e9

29832ac broke ADADelta, simply reversing its change and adding comments to not move around the epsilons.

Increase learning rate of OADAM

50092c8

To make it pass the Optimise testset.

scimas changed the title ~~Fix ADADelta doing nothing~~ Fix ADADelta calculations and broken tests not catching the problems Aug 5, 2020

DhairyaLGandhi added the bug label Aug 6, 2020

bors bot merged commit 2ea0d35 into FluxML:master Aug 6, 2020

scimas deleted the adadeltafix branch August 6, 2020 22:34

ToucheSir mentioned this pull request Jan 28, 2022

Port over rule changes from Flux FluxML/Optimisers.jl#38

Closed

3 tasks

Uh oh!

Fix ADADelta calculations and broken tests not catching the problems #1299

Fix ADADelta calculations and broken tests not catching the problems #1299

Uh oh!

Conversation

scimas commented Aug 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist

Uh oh!

CarloLucibello commented Aug 3, 2020

Uh oh!

scimas commented Aug 3, 2020

Uh oh!

scimas commented Aug 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DhairyaLGandhi commented Aug 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scimas commented Aug 4, 2020

Uh oh!

scimas commented Aug 4, 2020

Uh oh!

scimas commented Aug 4, 2020

Uh oh!

DhairyaLGandhi commented Aug 4, 2020

Uh oh!

scimas commented Aug 4, 2020

Uh oh!

scimas commented Aug 5, 2020

Uh oh!

DhairyaLGandhi commented Aug 5, 2020

Uh oh!

scimas commented Aug 5, 2020

Uh oh!

DhairyaLGandhi commented Aug 6, 2020

Uh oh!

bors bot commented Aug 6, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

scimas commented Aug 2, 2020 •

edited

Loading

scimas commented Aug 4, 2020 •

edited

Loading

DhairyaLGandhi commented Aug 4, 2020 •

edited

Loading