Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add opacus grad_sampler compatibility with torch.cat #448

Open
mvnelson422 opened this issue Jul 6, 2022 · 1 comment
Open

Add opacus grad_sampler compatibility with torch.cat #448

mvnelson422 opened this issue Jul 6, 2022 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@mvnelson422
Copy link

mvnelson422 commented Jul 6, 2022

🚀 Feature

Please make the opacus grad_sampler compatible with torch.cat operations in activation functions

Motivation

I've been trying to use the grad_sampler module with networks containing the CReLU activation function. However, the CReLU activation functions concatenates the output of the layer with the negative of itself, thus doubling the effective output size of the layer. This can be very useful and space-saving in networks that tend to develop mirrored filters (see https://arxiv.org/pdf/1603.05201v2.pdf).

Furthermore, using the CReLU activation functions it is possible to initialize fully connected networks so that they appear linear at initialization (see photo in additional context). This has been shown to be an extremely powerful initialization pattern, allowing fully connected networks to be trained with over 200 layers. That's incredible! Typical fully connected networks often struggle to learn appreciably with only 20+ layers (see https://arxiv.org/pdf/1702.08591.pdf).

Because of the symmetric initialization pattern the discontinuities in the CReLU activation function (after symmetric initialization) are dramatically smaller than in comparable networks with ReLU other activation functions. I've been studying gradient conditioning and stability in a variety of architectures using opacus, but it's broken for activation functions that use torch.cat. In the case of CReLU weight.grad_sample returns something that is half the size of the weight itself (ignoring the batch size).

Pitch

Implementing (or fixing) opacus grad_sampler compatibility with torch.cat would allow it to be used with a wider variety of activation functions, including CReLU, which would be really cool (see motivation section).

I didn't file this as a bug report because I'm not sure that torch.cat compatibility was ever intentionally implemented.

Alternatives

I can't think of any alternatives

Additional context

image

@ashkan-software ashkan-software added the enhancement New feature or request label Jul 12, 2022
@ashkan-software
Copy link
Contributor

Hello,

Thank you for filing this issue and explaining it really well.

Can you please provide more details on the error you're getting? Specifically, can you provide a minimal reproducing example? We have collab templates for minimal example when you create the issue

@ffuuugor ffuuugor self-assigned this Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants