Skip to content

More activation functions #74

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Nov 5, 2023
Merged

More activation functions #74

merged 11 commits into from
Nov 5, 2023

Conversation

topepo
Copy link
Member

@topepo topepo commented Nov 2, 2023

Closes #69

@christophscheuch and @dfalbel add more to the list! For now I'm avoiding any significantly parameterized functions.

@topepo topepo marked this pull request as ready for review November 2, 2023 11:06
@christophscheuch
Copy link

Thanks for picking it up!

I just checked the torch activation functions and I think we could add all except for nn_multihead_attention and nn_threshold because they have default values.

Here is a complete list (excl. multihead_attention and threshold), the ones with ✔️ are already in the PR:

  • relu ✔️
  • rrelu
  • hardtanh ✔️
  • relu6
  • sigmoid ✔️
  • hardsigmoid
  • tanh ✔️
  • hardswish
  • elu ✔️
  • celu
  • selu ✔️
  • glu
  • gelu
  • hardshrink
  • leaky_relu ✔️
  • log_sigmoid
  • softplus ✔️
  • softshrink
  • prelu
  • softsign
  • tanhshrink
  • softmin
  • softmax
  • softmax2d
  • log_softmax
  • contrib_sparsemax
  • silu

If supporting all these methods is too much (or too opaque for users because they don't see the defaults), then I'd only like to add softmin and softmax because they are frequently used (from my experience).

@topepo
Copy link
Member Author

topepo commented Nov 2, 2023

For now, I'm trying to avoid those with significant tuning parameters, meaning that we can include those whose defaults are pretty good. I could use some help making that determination. For example, would users really want to tune the upper and lower uniform bounds in nn_rrelu()? I'll eventually have a way to pass parameters in but not right now.

The softmax functions are questionable (from a programmatic point of view). They don't take the same primary argument that the others do (but maybe that's not a big deal).

@christophscheuch
Copy link

For now, I'm trying to avoid those with significant tuning parameters, meaning that we can include those whose defaults are pretty good. I could use some help making that determination. For example, would users really want to tune the upper and lower uniform bounds in nn_rrelu()? I'll eventually have a way to pass parameters in but not right now.

Frankly, I find it very hard to judge the defaults. After thinking about it again, I support restricting the activation functions to the ones that don't require any parameters (until eventually parameters can be passed). This strategy prevents brulee users from naively using parametrized activation functions and getting frustrated if they cannot change the parameters.

@topepo
Copy link
Member Author

topepo commented Nov 4, 2023

Agreed. I'll go with functions with non-learnable arguments.

Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Nov 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sigmoid and softmax activation functions for brulee_mlp()
2 participants