`AdamaxOptimizer#applyGradients` fails when the gradient's order changes #8379

benoitkoenig · 2024-09-14T13:49:06Z

Please make sure that this is a bug. As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:bug_template

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow.js):
- Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
- Ubuntu 22.04
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
TensorFlow.js installed from (npm or script link):
- npm
TensorFlow.js version (use command below):
@tensorflow/tfjs-node-gpu@4.20.0
Browser version:
- N/A, runs on Node v18.18.0
Tensorflow.js Converter Version:

Describe the current behavior

I have a NamedVariableMap of gradients that I want to apply to my models. When running AdamaxOptimizer#applyGradients, I sometimes get the error Invalid TF_Status: 3 - required broadcastable shapes. When logging Object.keys(gradients), I have noticed that the keys are not always sorted in the same way. This could explain the issue as AdamaxOptimizer#applyGradients's implementation relies on the order of Object.keys(variableGradients)

Describe the expected behavior

AdamaxOptimizer#applyGradients should not rely on the order of the keys being consistent, especially when calling it with NamedVariableMap since the order of keys then depends on Object#keys.

Standalone code to reproduce the issue

My use-case requires worker threads, which necessarily make it hard to reproduce. Let me know if you want me to write a reproduction repo for this

Other info / logs

To explain my use-case, I am training two models in an actor-critic experiment. The gradients for both the actor and the critic are compute within the same call to optimizer#computeGradients. Since Node.js is mono-threaded, I generate the gradients in three distinct worker threads, and periodically send them to the main thread to update the gradients of a centralized copy of the model. For each model taken individually, it appears that their weights are always in the same order; however, sometimes the weights of the actor appear first, sometimes the weights of the critic appear first. This bug always arise at the start of the training, never on the first call to optimizer#applyGradients, which indicates that the ordr of the gradients is consistent per thread, so the issue onl arises when one of the threads has a different order than the others

The text was updated successfully, but these errors were encountered:

benoitkoenig · 2024-09-14T14:02:35Z

Note: As a workaround, I simply replace my NamedVariableMap by a NamedTensor[] to which I apply .sort((a, b) => a.name.localeCompare(b.name)). This has apparently fixed the issue

shmishra99 · 2024-09-21T12:48:49Z

Hi @benoitkoenig ,

We are pleased to hear that your issue has been resolved. Please consider closing this issue. If you encounter any further difficulties, please feel free to raise a new issue.

Thank You!!

benoitkoenig · 2024-09-21T17:30:48Z

Hello @shmishra99 and thanks for your answer.

The issue is not resolved ^^' My second comment points out that there is a work-around possible, at least in my case. However, the current behavior, which is that the Adamax optimizer will fail if the gradients are not consistently passed in the same order, still seems like abug to me.

Let me know if I can help fix this. I've checked the code and would be happy to submit a pull request if that can help :-)

shmishra99 · 2024-09-22T10:50:56Z

Hi @benoitkoenig ,

Thank you for expressing your interest in contributing to tfjs. Could you please share a small, reproducible code snippet? This will help me to verify the behavior from my end.

Thank you!

benoitkoenig · 2024-09-25T11:31:02Z

Hi @shmishra99,

Here is a code snippet to reproduce the issue:

const model = tf.sequential({
  layers: [
    tf.layers.inputLayer({
      inputShape: [2, 2, 1],
    }),
    tf.layers.conv2d({
      filters: 2,
      padding: "same",
      kernelSize: 2,
      kernelInitializer: "zeros",
      kernelConstraint: tf.constraints.nonNeg(),
      useBias: false,
    }),
    tf.layers.conv2d({
      filters: 1,
      kernelSize: 2,
      kernelInitializer: "zeros",
      kernelConstraint: tf.constraints.nonNeg(),
      useBias: false,
    }),
  ],
});

const optimizer = new tf.AdamaxOptimizer(0.1, 0.9, 0.999);

const { grads } = optimizer.computeGradients(() =>
  (model.predict(tf.ones([1, 2, 2, 1])) as tf.Tensor).mean(),
);

optimizer.applyGradients(grads);

console.log("Gradients are applied the first time");

/** By removing the first entry in grads, the gradients are no longer sorted in the same order as during the first call to `applyGradients` */
const keyToRemove = Object.keys(grads)[0];
const { [keyToRemove]: _, ...grads2 } = grads;

/** The following line throws an error: shape of the new value (2,2,2,2) and previous value (2,2,1,2) must match */
optimizer.applyGradients(grads2);

Adamax#applyGradients relies on Object.keys(variableGradients). Consequently, if Object.keys(variableGradients) does not yield the same keys in the same order everytime it is called, then the optimizer will mix up the gradients and throw a mismatching-shapes error.

This situation happens in my scenario where I am training an actor-critic asynchronously on multiple threads: sometimes the weights of the actor come first, and sometimes the weights of the critic do.

I offered to open a PR to fix this: my idea is to update the Adamax Optimizer for accumulatedWeightedInfNorm and accumulatedFirstMoment to be indexed by gradient key, rather than index.

Thank you for your time,

Benoît

benoitkoenig added the type:bug Something isn't working label Sep 14, 2024

shmishra99 added the type:build/install label Sep 17, 2024

shmishra99 self-assigned this Sep 19, 2024

shmishra99 added the stat:awaiting response label Sep 21, 2024

google-ml-butler bot removed the stat:awaiting response label Sep 21, 2024

shmishra99 added the stat:awaiting response label Sep 22, 2024

google-ml-butler bot removed the stat:awaiting response label Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`AdamaxOptimizer#applyGradients` fails when the gradient's order changes #8379

`AdamaxOptimizer#applyGradients` fails when the gradient's order changes #8379

benoitkoenig commented Sep 14, 2024 •

edited

Loading

benoitkoenig commented Sep 14, 2024

shmishra99 commented Sep 21, 2024 •

edited

Loading

benoitkoenig commented Sep 21, 2024

shmishra99 commented Sep 22, 2024

benoitkoenig commented Sep 25, 2024 •

edited

Loading

AdamaxOptimizer#applyGradients fails when the gradient's order changes #8379

AdamaxOptimizer#applyGradients fails when the gradient's order changes #8379

Comments

benoitkoenig commented Sep 14, 2024 • edited Loading

benoitkoenig commented Sep 14, 2024

shmishra99 commented Sep 21, 2024 • edited Loading

benoitkoenig commented Sep 21, 2024

shmishra99 commented Sep 22, 2024

benoitkoenig commented Sep 25, 2024 • edited Loading

`AdamaxOptimizer#applyGradients` fails when the gradient's order changes #8379

`AdamaxOptimizer#applyGradients` fails when the gradient's order changes #8379

benoitkoenig commented Sep 14, 2024 •

edited

Loading

shmishra99 commented Sep 21, 2024 •

edited

Loading

benoitkoenig commented Sep 25, 2024 •

edited

Loading