Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functional layer API, conventions #21

Closed
albertz opened this issue Aug 9, 2021 · 5 comments
Closed

Functional layer API, conventions #21

albertz opened this issue Aug 9, 2021 · 5 comments
Milestone

Comments

@albertz
Copy link
Member

albertz commented Aug 9, 2021

Definition: Functional means that the layer / op does not have trainable parameters.

Examples:

  • tanh, sigmoid etc, i.e. all math ops. RETURNN: ActivationLayer
  • dot/matmul/einsum. RETURNN: DotLayer
  • split. RETURNN: SplitLayer

Instead of writing Activation(activation="tanh")(x), one should be able to write simply tanh(x).
Instead of Dot(...)(x, y), one should be able to write dot(x, y, ...) or so. (Or maybe using a more einsum-like API.)
Instead of Split(...)(x), split(x, ...).

Similar as in the PyTorch functional API.

The naming convention would be to start with lowercase letter, unlike modules which start with upper case.

Also, modules are classes, and need to be initiated. The functional API would just behave like functions.

Related are also the elemwise ops on a LayerRef, such as +, == etc.


Some open questions:

  • Where to define? Namescope?
  • How far automatically generated?
    • E.g. we could extend the current layer generation code, to automatically put layers without params into the functional namescope.
    • Still we additionally want to manually/explicitly define some functions, e.g. einsum/dot. Also tanh etc need to be explicit.
  • Should we have always both variants, like Sigmoid as a module, and sigmoid as a function?
    • PyTorch has this for some functions. But not always.
    • Flax only has the functional variant when some op is purely functional.
@albertz
Copy link
Member Author

albertz commented Oct 20, 2021

We can also have both, i.e. a Tanh module, and then a tanh function. Similar as PyTorch.

This might be more natural for things like Sequential (#33).

@albertz
Copy link
Member Author

albertz commented Oct 27, 2021

This is basically done now. We have both when it makes sense (dropout and Dropout).
Activation functions anyway need to be defined manually (Tanh etc), so this is work in progress.
But the conventions are clear.
So closing this now.

@albertz
Copy link
Member Author

albertz commented Oct 28, 2021

Ok, now I start to question again whether it makes sense to have both dropout and Dropout...

The question is, do we want to have that for every function? When there is a function cross_entrpoy, there should also be a module CrossEntropy? (#38)

Isn't that a bit annoying, that you would always need to write both?

Or do we want to have it mixed? In some cases, we have both versions, in others, only the function?
And what is the convention when we have both vs only having the function? Or just arbitrary or random?

@albertz albertz reopened this Oct 28, 2021
@albertz
Copy link
Member Author

albertz commented Oct 28, 2021

Btw, for Sequential, maybe there it makes sense to have a function sequential, in case you want to chain pure functions (modules without params).
(I wondered now whether there is such sequential or chain function already somewhere in the Python stdlib but it doesn't seem so. Related question.)

Example:

y = x + sequential(x, layer_norm, self.lin1, tanh, self.lin2, dropout)

Or maybe not. Maybe it is just fine to do:

y = x + Sequential(layer_norm, self.lin1, tanh, self.lin2, dropout)(x)

albertz added a commit that referenced this issue Oct 28, 2021
albertz added a commit that referenced this issue Oct 28, 2021
@albertz
Copy link
Member Author

albertz commented Oct 28, 2021

I think we can leave it as it is now, i.e. only the function. In case the function has further option arguments (dropout and many others), you can use functools.partial, like:

Sequential(layer_norm, self.linear, functools.partial(dropout, dropout=0.1))

@albertz albertz closed this as completed Oct 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant