-
Notifications
You must be signed in to change notification settings - Fork 7
More activation functions #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for picking it up! I just checked the torch activation functions and I think we could add all except for Here is a complete list (excl. multihead_attention and threshold), the ones with ✔️ are already in the PR:
If supporting all these methods is too much (or too opaque for users because they don't see the defaults), then I'd only like to add |
For now, I'm trying to avoid those with significant tuning parameters, meaning that we can include those whose defaults are pretty good. I could use some help making that determination. For example, would users really want to tune the upper and lower uniform bounds in The softmax functions are questionable (from a programmatic point of view). They don't take the same primary argument that the others do (but maybe that's not a big deal). |
Frankly, I find it very hard to judge the defaults. After thinking about it again, I support restricting the activation functions to the ones that don't require any parameters (until eventually parameters can be passed). This strategy prevents brulee users from naively using parametrized activation functions and getting frustrated if they cannot change the parameters. |
Agreed. I'll go with functions with non-learnable arguments. |
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
Closes #69
@christophscheuch and @dfalbel add more to the list! For now I'm avoiding any significantly parameterized functions.