Implementation of MoMo algorithm #721

fabian-sp · 2024-01-19T10:05:11Z

Upon suggestion by @fabianp I implemented the MoMo algorithm. MoMo is esentially a Polyak step size for SGD with momentum and for Adam (see https://arxiv.org/abs/2305.07583).

The Rosenbrock and least squares tests are passing locally.

I have still a few questions as this is the first time I am implementing in Optax:

MoMo needs in each iteration the latest batch loss passed into update_fn. I named this argument loss, and adpated the tests. But maybe you have a convention how sth like this would be handled.

google-cla · 2024-01-19T10:05:16Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

fabian-sp · 2024-01-19T13:08:17Z

Regarding the second bullet in the original post, I think I have now solved this by using jax.lax.cond.

optax/contrib/momo.py

fabianp · 2024-01-20T15:12:48Z

Thanks a lot @fabian-sp ! This looks great. I'll make a more through review once I get the vanilla Polyak SGD working (#718) :-)

Co-authored-by: Fabian Pedregosa <pedregosa@google.com>

fabianp · 2024-03-26T12:26:19Z

hey @fabian-sp , so sorry for the huge delay on this .... Merging the Polyak step-size highlighted some subtle issues that ended all the way in fixing stuff in pytype ... anyway, we got it merged finally, and now we can focus on MoMo!

A few of high level before we do a detailed review:

Since you first coded this, @vroulet added some common tests to the contrib directory (https://github.com/google-deepmind/optax/blob/main/optax/contrib/_common_test.py). Please add your solver to this file (under _OPTIMIZERS_UNDER_TEST)
Please merge the files momo.py and momo_adam.py into a single file, and the same for the *_test.py files.
Please take a look at the implementation of polyak_sgd in https://github.com/google-deepmind/optax/blob/main/optax/_src/alias.py . I suspect (but could be wrong) that a similar structure that splits the computation of the update from that of the step-size would make sense here too.

Thanks for all the work! 🙏🏼

fabian-sp · 2024-03-26T13:31:44Z

Thanks @fabianp, I merged the files. Somehow, after updating to the latest main, if I run locally the test.sh I get the following error

ERROR: Cannot install optax 0.1.9.dev0 (from /home/schaipp/uni/other/optax/dist/optax-0.1.9.dev0.tar.gz) and optax 0.2.2.dev0 (from /home/schaipp/uni/other/optax/dist/optax-0.2.2.dev0.tar.gz) because these package versions have conflicting dependencies.

Do you know how to fix this? Maybe some old installation of the optax package, but I thought the tester script would install in a new environment anyhow?

fabianp · 2024-03-26T13:39:15Z

you might need to uninstall optax pip uninstall optax before running the tests

fabianp · 2024-03-26T13:40:25Z

but yeah, its strange ...

fabian-sp · 2024-03-26T13:42:34Z

pip uninstall didnt solve it. Deleting all files in dist/ did the trick though.

The Github action tests are failing now, not sure why but probably because of changes made in the rest of the package since then?

fabian-sp · 2024-03-26T14:08:34Z

Okay, so the issue was that Momo needs the loss function value in the update (like Polyak SGD). This seems to be incompatible with the common tests. So I removed it for now from there, and tests are passing locally.

Depending on how you solved this for Polyak SGD, we can do it the same for Momo. @fabianp @vroulet

fabianp · 2024-03-26T14:21:31Z

so we basically wrote an if/else for polyak_sgd: https://github.com/google-deepmind/optax/blob/main/optax/_src/alias_test.py

fabian-sp · 2024-03-27T15:19:57Z

Okay, from my side the changes you requested should be implemented. A minor question: for Polyak-SGD you call the loss value argument value, while for MoMo I called it loss. This is mostly because I am used to Pytorch - feel free to change this if you prefer value.

fabianp · 2024-03-27T15:40:30Z

yes, please use value so it's consistent with the rest of optimizers in optax

fabianp

Thanks again for the contribution! Some minor comments here

optax/contrib/momo.py

fabianp · 2024-03-29T15:40:41Z

optax/contrib/momo.py

+    learning_rate: base.ScalarOrSchedule = 1e-2,
+    betas: tuple[float, float] = (0.9, 0.999),
+    eps: float = 1e-8,
+    lb: float = 0.0,


how about calling this f_min as with polyak_sgd ? https://optax.readthedocs.io/en/latest/api/optimizers.html#optax.polyak_sgd

I would prefer the term lower bound, as this is closer to how we describe this quantity in the paper (strictly speaking, you don't need the optimal value for deriving MoMo, but only a lower bound). But I see your point of consistent naming...

ok, but let's at least name it lower_bound ? lb is not very descriptive

sure, can do that. Just fyi that I now added also an option where the lower bounded is estimated on the fly, which then will make some variable names a bit lengthy :) but at least the function argument should have a descriptive name

yes exactly, having descriptive names in the function signature is the most important. Its fine if private variables have more cyptic names.

fabianp · 2024-03-29T15:42:47Z

optax/contrib/momo.py

+    count = jnp.zeros([], jnp.int32)
+    return MomoAdamState(exp_avg, exp_avg_sq, barf, gamma, count)
+
+  def update_fn(


have you considered writing this optimizer as a chain using the existing scale_by_adam ? If possible it would probably result in a much shorter (and reusable) code.

For computing the adaptive learning rate, we need to compute the Adam EMAs, and some other quantities based on them. So I thought, it would be best to avoid double computations to have all in one function. But I might be wrong here...

No, I think what you say makes sense

Co-authored-by: Fabian Pedregosa <pedregosa@google.com>

fabian-sp · 2024-04-02T10:41:02Z

Somehow after changing the formatting suggestions of @fabianp , now one test fails (the injection test for momo_adam). It's a bit mysterious for me, maybe you can have a look.

I will also now implement the adaptive lower bound estimation, that we proposed in the paper.

fabianp · 2024-04-02T12:18:57Z

Thanks! For the injection test, you might want to try converting the elements of the state into arrays. I vaguely recall having similar problems and that solving the issue

fabian-sp · 2024-04-02T13:28:26Z

Thanks, I did this. But the problem was solved by using a loss value other than zero here:

optax/optax/contrib/_common_test.py

Line 117 in 97128c1

update_kwargs = {'value': jnp.array(1.)}

With zero, it takes no step (as the Polyak step size is zero), and so it compares numerically zero values against each other, which seems to fail.

Momo lb adapt

fabianp · 2024-04-04T09:34:34Z

FYI there are some test failures triggered by the last jax release that are unrelated to this PR (so don't worry about that for now, we're working on fixing them in parallel)

fabian-sp · 2024-04-10T18:37:57Z

Anything left for me to do, or are you gonna merge after fixing the upstream bug?

fabianp · 2024-04-10T18:49:13Z

Can you update your branch from master? hopefully the tests will then run again

fabianp · 2024-04-23T07:31:48Z

hey Fabian, sorry for the delay on this one. I was about to merge yesterday, but I realized what seemed to me like duplicated code between test_momo.py and _test_common.py . In particular, it seems that test_momo is testing on the same parabola and rosenbrock function that _test_commo.py . Am I missing something?

fabianp · 2024-04-23T08:18:33Z

It's now merged without test_momo.py (it was throwing some errors on our internal tests). Let me know how important it was and we can add it back in a follow up PR

fabian-sp · 2024-04-23T08:28:55Z

Hi Fabian, thanks for the final checks. The test_momo.py indeed had the initial tests, but later I used the common test structure of _test_common.py. So deleting it should be fine.

Thanks again for the interest in MoMo and the final push! 📦

fabian-sp added 3 commits January 18, 2024 13:49

initial momo implementation

d35b1d5

improve docstring

a1cd60f

test pass locally

98d6601

make if condition jittable

47404ab

fabianp reviewed Jan 20, 2024

View reviewed changes

optax/contrib/momo.py Outdated Show resolved Hide resolved

fabianp reviewed Jan 20, 2024

View reviewed changes

optax/contrib/momo.py Outdated Show resolved Hide resolved

fabian-sp and others added 4 commits January 20, 2024 21:08

Update optax/contrib/momo.py

a2e6ad6

Co-authored-by: Fabian Pedregosa <pedregosa@google.com>

Update optax/contrib/momo.py

1fade90

Co-authored-by: Fabian Pedregosa <pedregosa@google.com>

add no step flag and momo adam

23f0d91

variable naming

8a81f5c

fabian-sp added 3 commits March 26, 2024 14:20

merge files

d653da9

merge main

900e841

add to common test

269a828

remove common test for now

041f22b

added to common tests

008785f

rename loss to value

5728ce0

fabianp reviewed Mar 29, 2024

View reviewed changes

Update optax/contrib/momo.py

63c2444

Co-authored-by: Fabian Pedregosa <pedregosa@google.com>

fabian-sp and others added 5 commits April 2, 2024 11:34

Update optax/contrib/momo.py

ea092c5

Co-authored-by: Fabian Pedregosa <pedregosa@google.com>

Update optax/contrib/momo.py

12ae93a

Co-authored-by: Fabian Pedregosa <pedregosa@google.com>

code highlighting

0273aef

fix conflict

a2d9c19

formatting

b295b3f

fabian-sp added 2 commits April 2, 2024 14:33

implement lower bound estimators

3ee7686

change state init to array

97128c1

fabian-sp and others added 4 commits April 2, 2024 15:55

merge

4ad3708

Merge pull request #3 from fabian-sp/momo-lb-adapt

3bd4dfc

Momo lb adapt

rename lb

d2511af

Merge branch 'main' into momo

766a8f8

Merge branch 'main' into momo

ce05763

fabianp approved these changes Apr 22, 2024

View reviewed changes

copybara-service bot merged commit 748ce7f into google-deepmind:main Apr 23, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of MoMo algorithm #721

Implementation of MoMo algorithm #721

fabian-sp commented Jan 19, 2024 •

edited

Loading

google-cla bot commented Jan 19, 2024

fabian-sp commented Jan 19, 2024

fabianp commented Jan 20, 2024

fabianp commented Mar 26, 2024

fabian-sp commented Mar 26, 2024

fabianp commented Mar 26, 2024

fabianp commented Mar 26, 2024

fabian-sp commented Mar 26, 2024 •

edited

Loading

fabian-sp commented Mar 26, 2024

fabianp commented Mar 26, 2024

fabian-sp commented Mar 27, 2024

fabianp commented Mar 27, 2024

fabianp left a comment

fabianp Mar 29, 2024

fabian-sp Apr 2, 2024

fabianp Apr 3, 2024

fabian-sp Apr 3, 2024

fabianp Apr 4, 2024

fabianp Mar 29, 2024

fabian-sp Apr 2, 2024

fabianp Apr 3, 2024

fabian-sp commented Apr 2, 2024

fabianp commented Apr 2, 2024

fabian-sp commented Apr 2, 2024

fabianp commented Apr 4, 2024

fabian-sp commented Apr 10, 2024

fabianp commented Apr 10, 2024

fabianp commented Apr 23, 2024

fabianp commented Apr 23, 2024

fabian-sp commented Apr 23, 2024

Implementation of MoMo algorithm #721

Implementation of MoMo algorithm #721

Conversation

fabian-sp commented Jan 19, 2024 • edited Loading

google-cla bot commented Jan 19, 2024

fabian-sp commented Jan 19, 2024

fabianp commented Jan 20, 2024

fabianp commented Mar 26, 2024

fabian-sp commented Mar 26, 2024

fabianp commented Mar 26, 2024

fabianp commented Mar 26, 2024

fabian-sp commented Mar 26, 2024 • edited Loading

fabian-sp commented Mar 26, 2024

fabianp commented Mar 26, 2024

fabian-sp commented Mar 27, 2024

fabianp commented Mar 27, 2024

fabianp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabian-sp commented Apr 2, 2024

fabianp commented Apr 2, 2024

fabian-sp commented Apr 2, 2024

fabianp commented Apr 4, 2024

fabian-sp commented Apr 10, 2024

fabianp commented Apr 10, 2024

fabianp commented Apr 23, 2024

fabianp commented Apr 23, 2024

fabian-sp commented Apr 23, 2024

fabian-sp commented Jan 19, 2024 •

edited

Loading

fabian-sp commented Mar 26, 2024 •

edited

Loading