-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Augmentation Multiplicity #455
Comments
@timudk Thank you for bringing this! |
Just a note that we have implemented this in an internal codebase by making various tweaks to Opacus. This involves changing the layout of the clipping part, but the privacy accounting shouldn't change since each 'user' still contributes one (averaged) gradient. |
@lxuechen any plans to make the codebase public anytime soon? |
@timudk I think you can use Xuechen's modified privacy-engine for this? https://github.com/lxuechen/private-transformers/blob/main/private_transformers/privacy_engine.py I'm not sure but it looks similar to my internal implementation of the augmentation multiplicity trick. It does require the somewhat significant API modification of requiring the optimizer itself to take in the 1-D loss tensor as input, but I'm not sure how to get around this one way or another. |
hi folks, thanks for following up. that codebase actually does not yet have the aug. multiplicity trick. we're working on a project that involves this and will release the code when the project is done. implementing the trick is actually not that hard. you could for instance duplicate indices in the sampler and perform gradient aggregation after the per-sample grads are instantiated. |
Just wanted to point out that this is now possible with Functorch ( |
Sorry, I'm still confused. is there some other to change if i want to implement the augmentation multiplicity? i am glad to get your reply. |
🚀 Feature
Augmentation multiplicity as explained in Section 3.1 of https://arxiv.org/abs/2204.13650. Before clipping gradients, we may average gradients over different augmentations.
Motivation
Paper shows improved results compared to no augmentation.
If my understanding is correct, currently this could be implemented by averaging over different augmentations in the loss function, but I am not 100% sure this is correct.
The text was updated successfully, but these errors were encountered: