Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AdamaxOptimizer#applyGradients fails when the gradient's order changes #8379

Open
benoitkoenig opened this issue Sep 14, 2024 · 5 comments
Open
Assignees
Labels
type:bug Something isn't working type:build/install

Comments

@benoitkoenig
Copy link

benoitkoenig commented Sep 14, 2024

Please make sure that this is a bug. As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow.js):
    • Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    • Ubuntu 22.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow.js installed from (npm or script link):
    • npm
  • TensorFlow.js version (use command below):
    @tensorflow/tfjs-node-gpu@4.20.0
  • Browser version:
    • N/A, runs on Node v18.18.0
  • Tensorflow.js Converter Version:

Describe the current behavior

I have a NamedVariableMap of gradients that I want to apply to my models. When running AdamaxOptimizer#applyGradients, I sometimes get the error Invalid TF_Status: 3 - required broadcastable shapes. When logging Object.keys(gradients), I have noticed that the keys are not always sorted in the same way. This could explain the issue as AdamaxOptimizer#applyGradients's implementation relies on the order of Object.keys(variableGradients)

Describe the expected behavior

AdamaxOptimizer#applyGradients should not rely on the order of the keys being consistent, especially when calling it with NamedVariableMap since the order of keys then depends on Object#keys.

Standalone code to reproduce the issue

My use-case requires worker threads, which necessarily make it hard to reproduce. Let me know if you want me to write a reproduction repo for this

Other info / logs

To explain my use-case, I am training two models in an actor-critic experiment. The gradients for both the actor and the critic are compute within the same call to optimizer#computeGradients. Since Node.js is mono-threaded, I generate the gradients in three distinct worker threads, and periodically send them to the main thread to update the gradients of a centralized copy of the model. For each model taken individually, it appears that their weights are always in the same order; however, sometimes the weights of the actor appear first, sometimes the weights of the critic appear first. This bug always arise at the start of the training, never on the first call to optimizer#applyGradients, which indicates that the ordr of the gradients is consistent per thread, so the issue onl arises when one of the threads has a different order than the others

@benoitkoenig benoitkoenig added the type:bug Something isn't working label Sep 14, 2024
@benoitkoenig
Copy link
Author

Note: As a workaround, I simply replace my NamedVariableMap by a NamedTensor[] to which I apply .sort((a, b) => a.name.localeCompare(b.name)). This has apparently fixed the issue

@shmishra99
Copy link
Contributor

shmishra99 commented Sep 21, 2024

Hi @benoitkoenig ,

We are pleased to hear that your issue has been resolved. Please consider closing this issue. If you encounter any further difficulties, please feel free to raise a new issue.

Thank You!!

@benoitkoenig
Copy link
Author

Hello @shmishra99 and thanks for your answer.

The issue is not resolved ^^' My second comment points out that there is a work-around possible, at least in my case. However, the current behavior, which is that the Adamax optimizer will fail if the gradients are not consistently passed in the same order, still seems like abug to me.

Let me know if I can help fix this. I've checked the code and would be happy to submit a pull request if that can help :-)

@shmishra99
Copy link
Contributor

Hi @benoitkoenig ,

Thank you for expressing your interest in contributing to tfjs. Could you please share a small, reproducible code snippet? This will help me to verify the behavior from my end.

Thank you!

@benoitkoenig
Copy link
Author

benoitkoenig commented Sep 25, 2024

Hi @shmishra99,

Here is a code snippet to reproduce the issue:

const model = tf.sequential({
  layers: [
    tf.layers.inputLayer({
      inputShape: [2, 2, 1],
    }),
    tf.layers.conv2d({
      filters: 2,
      padding: "same",
      kernelSize: 2,
      kernelInitializer: "zeros",
      kernelConstraint: tf.constraints.nonNeg(),
      useBias: false,
    }),
    tf.layers.conv2d({
      filters: 1,
      kernelSize: 2,
      kernelInitializer: "zeros",
      kernelConstraint: tf.constraints.nonNeg(),
      useBias: false,
    }),
  ],
});

const optimizer = new tf.AdamaxOptimizer(0.1, 0.9, 0.999);

const { grads } = optimizer.computeGradients(() =>
  (model.predict(tf.ones([1, 2, 2, 1])) as tf.Tensor).mean(),
);

optimizer.applyGradients(grads);

console.log("Gradients are applied the first time");

/** By removing the first entry in grads, the gradients are no longer sorted in the same order as during the first call to `applyGradients` */
const keyToRemove = Object.keys(grads)[0];
const { [keyToRemove]: _, ...grads2 } = grads;

/** The following line throws an error: shape of the new value (2,2,2,2) and previous value (2,2,1,2) must match */
optimizer.applyGradients(grads2);

Adamax#applyGradients relies on Object.keys(variableGradients). Consequently, if Object.keys(variableGradients) does not yield the same keys in the same order everytime it is called, then the optimizer will mix up the gradients and throw a mismatching-shapes error.

This situation happens in my scenario where I am training an actor-critic asynchronously on multiple threads: sometimes the weights of the actor come first, and sometimes the weights of the critic do.

I offered to open a PR to fix this: my idea is to update the Adamax Optimizer for accumulatedWeightedInfNorm and accumulatedFirstMoment to be indexed by gradient key, rather than index.

Thank you for your time,

Benoît

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working type:build/install
Projects
None yet
Development

No branches or pull requests

2 participants