-
Notifications
You must be signed in to change notification settings - Fork 2
Implement Gradient Accumulation #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: ga
Are you sure you want to change the base?
Conversation
|
What's your distribute strategy? |
|
For training my model I use |
|
Does the bug happen on single GPU? |
Your code works for me on CPU and single GPU but not on multi-GPU. |
|
Could you debug my code with |
|
Add a test case for embedding. |
Your code seems to work with I added a test for the |
|
|
||
| if isinstance(grad, tf.IndexedSlices): | ||
| # Not sure why e.g. the Embedding layer requires an additional dimension here | ||
| grad_indices = grad.indices[..., None] if len(grad.indices.shape) < 2 else grad.indices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just weird 🤷♂️
| return dataset | ||
|
|
||
| strategy = tf.distribute.MirroredStrategy(test_utils.gpus_for_testing()) | ||
| strategy = tf.distribute.get_strategy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fsx950223 I noticed that I used the wrong strategy here.
|
It's a good idea to use two-pointers to accumulate gradients. It's better to beautify the code. |
What do you mean? btw: do we need this PR or should I close it? |
|
Since the google cla issue, you should open another PR without my commits. |
@fsx950223 want to discuss here?