Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Make inner transform activation configurable for LSTMCell #10957

Merged
merged 9 commits into from
Jun 5, 2018

Conversation

mrkumar83
Copy link
Contributor

Description

Some papers recommend using sigmoid for the inner activation gate for an LSTM.

Other frameworks such as tensorflow alllow this:
https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/BasicLSTMCell
where they have an activation parameter.

Wanted to provide something similar in MXNet.

Checklist

Essentials

  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Added Parameter to LSTMCell class for inner transform activation.

@mrkumar83 mrkumar83 requested a review from szha as a code owner May 15, 2018 21:45
@@ -441,6 +441,9 @@ class LSTMCell(HybridRecurrentCell):
params : Parameter or None
Container for weight sharing between cells.
Created if `None`.
in_transform_activation_type : str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is too verbose. They are usually called activation and recurrent activation.
activation is applied to both input and next c

@mrkumar83
Copy link
Contributor Author

Added parameters, recurrent_activation and activation.

@chinakook
Copy link
Contributor

But the F.activation only support 4 types of activation functions. Many other activation functions (with parameters) cannot pass like tensorflow with string, such as elu/selu/prelu/leakyrelu/hard_sigmoid etc.

@piiswrong
Copy link
Contributor

@szha

@szha
Copy link
Member

szha commented May 21, 2018

Will take a look shortly. Maybe it's worth having a utility function that wraps all activation types in the most efficient way (e.g. F.tanh instead of F.activation(act_type='tanh')) so that it can be reused everywhere.

@@ -473,6 +480,9 @@ def __init__(self, hidden_size,
self.h2h_bias = self.params.get('h2h_bias', shape=(4*hidden_size,),
init=h2h_bias_initializer,
allow_deferred_init=True)
self.activation = activation
self.recurrent_activation = recurrent_activation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_activation, _recurrent_activation

F.Activation(slice_gates[1], act_type=self.recurrent_activation, name=prefix+'f')
in_transform = F.Activation(
slice_gates[2], act_type=self.activation, name=prefix+'c')
out_gate = F.Activation(slice_gates[3], act_type=self.recurrent_activation, name=prefix+'o')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use _get_activation

@@ -255,8 +256,7 @@ def _get_activation(self, F, inputs, activation, **kwargs):
return F.Activation(inputs, act_type=activation, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for string type, map the string to the most efficient operator. for example, if the string is 'tanh', instead of doing F.Activation(act_type='tanh'), do F.tanh, which doesn't require parsing the string at each call.

@piiswrong
Copy link
Contributor

@mrkumar83 @szha Any updates?

@szha if original author doesn't respond could you take this over.

@mrkumar83
Copy link
Contributor Author

@piiswrong
Should have an update pr in the next 2 days

elif activation == 'sigmoid':
return F.sigmoid(inputs, **kwargs)
elif activation == 'relu':
return F.relu(inputs, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add softsign

@szha szha merged commit 776b239 into apache:master Jun 5, 2018
zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
* Make inner activation gate configurable for LSTMCell

* Adding pr feedback

* Adding a recurrent_activation and activation similar to Keras

* Fixing all pylint issues in the file

* Adding initial pr feedback

* Adding cr feedback

* Adding softsign support
XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this pull request Aug 29, 2018
* Make inner activation gate configurable for LSTMCell

* Adding pr feedback

* Adding a recurrent_activation and activation similar to Keras

* Fixing all pylint issues in the file

* Adding initial pr feedback

* Adding cr feedback

* Adding softsign support
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants