-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Microbatching Support #655
Comments
Thanks @shs037 for bringing this to the table! We currently do not have any plan to support this function, considering its limited use case inside Meta. However, I am happy to provide/discuss about the implementation if you want to contribute a PR. One quick idea is to make changes in the optimizer function. Instead of clipping, we average first then clip. |
Thanks a lot! Is it basically like changing a few lines in the function you linked? |
Yeah, I think a hacky solution (without a very careful interface design) should require minimal changes. "self.grad_samples" (per_sample_gradient) is a tensor with shape = batch_size* #parameters. You just need to divide it into several microbatches, and take average for each microbatch. Perhaps you will also need to change "scale_grad" (https://github.com/pytorch/opacus/blob/main/opacus/optimizers/optimizer.py#L441) to make sure of the correctness of scale. |
This approach might be problematic if you have multiple mini-batches between the two optimizer steps. But I believe it is a very rare situation. |
🚀 Feature
Support microbatch size > 1, i.e., clipping multiple (instead of one) gradients.
Motivation
We want to experiment with microbatch size > 1 for some training tasks.
(I understand that microbatch size > 1 may not improve memory / computation efficiency. This ask is more about algorithm / utility.)
Pitch
A
num_microbatches
parameter inmake_private
, similar to tf privacy.The text was updated successfully, but these errors were encountered: