Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recurrent Attention API: Support constants in RNN #7980

Merged

Conversation

andhus
Copy link
Contributor

@andhus andhus commented Sep 24, 2017

This PR is the first step to support recurrent attention mechanisms as suggested in #7633.

In short, it adds the possibility to pass constants (besides inputs and initial_state) when calling the RNN layer. The RNN will pass on the tensor(s) to the call method of the RNN Cell which (if constants are passed) is expected to implement the signature:
call(inputs, states, constants)

@roya0045
Copy link

Any idea why it seems to fail with CNTK (according to Travis)?

@fchollet
Copy link
Member

Can you explain why this would be necessary? I believe RNN cells that need specific constants should be managing their own constants, e.g. what we do for dropout (the parent RNN is agnostic to dropout constants).

I am pretty sure Attention wrappers would be implementable with the current API

@andhus
Copy link
Contributor Author

andhus commented Oct 1, 2017

The key here is that the constants (the “attended” when used for attention) really are inputs to the the recurrent layer (I refer to them as “external” constants in docs but maybe we need a better name). I think it would be bad and confusing to break the principle that all inputs to a layer should be passed to that layer in __call__, i.e. when it is applied to its inputs in the functional API.

The constant(s) needed for attention is fundamentally different from dropout, as the later is only a regularisation method and we don’t pass the dropout values in inference mode (they are only generated in in training) but the attended, of course, have to passed as input.

When it comes to implementation/API, it is certainly possible to implement attention in keras only using custom wrapper - I’ve done it several times and that was the solution in my original API suggestion, which I think is worse ;).

But let’s looks at some options...

This is the “look and feel” to the user of my current suggestion:

input_sequence = Input((timesteps, features))
attended_sequence = Input((attended_timesteps, attended_features))

attention_lstm = RNN(
    cell=SequenceAttention(LSTMCell(32)),
    return_sequences=True  # could of course be False in another use-case
)
lstm_output_sequence = attention_lstm(
    input_sequence,
    constants=attended_sequence
)

output_layer = TimeDistributed(Dense(n_labels, activation='softmax'))
output = output_layer(lstm_output_sequence)

model = Model(inputs=[input_sequence, attended_sequence], outputs=output)

I guess what your suggestion means is that you would only let the cell know about the attended, something like:

cell = SequenceAttention(LSTMCell(32), attended=attended_sequence)
attention_lstm = RNN(cell=cell, return_sequences=True)
lstm_output_sequence = attention_lstm(input_sequence)

Or maybe:

cell = SequenceAttention(LSTMCell(32))
cell.attended = attended_sequence
attention_lstm = RNN(cell=cell, return_sequences=True)
lstm_output_sequence = attention_lstm(input_sequence)

You can probably make this work but I'd like to avoid it for a number of reasons:

  1. I don’t think there is any other situation in keras where one would inject an input tensor in the init of a layer - it feels intuitively wrong to have these implicit inputs (for which we, again, have to pass data for both in training an inference).

  2. If not all inputs are passed in call, it breaks the possibility to easily reuse the layer in several places of the computational graph.

  3. Typically you need to build the attention mechanism (initialise weights etc.) and I see no reason why this shouldn’t be done in the build method (triggered by __call__) where you get the input shapes of all inputs.

I think a good comparison is why we can (optionally) pass initial_state to the RNN. If we do, it should really be considered an input (it needs to be passed in inference) - otherwise it's an "internal thing" like dropout. Just like the initial state, the attended can (or will in most cases) be the output of a previous layer.

Another option would be to treat the attended as part of the initial_state but that's a hack I think and leads to several challenges.

Yet another (better) option is to add a new class AttentionRNN(RNN) which adds the functionality suggested in this PR without touching the existing RNN - but then you need to both wrap the cell with an attention mechanism and use another Parent RNN layer for using it. This would be fine I think but since the addition in RNN just adds an additional optional argument and is backward compatible I thought it would be better this way.

It is also, of course, possible that I'm missing some other option. Please question the above!

@andhus
Copy link
Contributor Author

andhus commented Oct 1, 2017

The fact that we in the current RNN implementation "temporarily" modify the input_spec when initial_state is passed (https://github.com/fchollet/keras/blob/master/keras/layers/recurrent.py#L453) is very related to this discussion. I've just extended the same behaviour for constants but I don't really like it. Just like @farizrahman4u points out in a comment on the second PR: #7981 it would be better to be explicit about all tensors that really are inputs in the input_spec

@fchollet
Copy link
Member

fchollet commented Oct 1, 2017

Thanks for the detailed explanation! It's a reasonable choice. An alternative would be the following:

  • AttentionRNN subclass of RNN provides support for attention
  • attended_sequence keyword argument in the call of AttentionRNN
  • which gets added to the input list (just like we do for the initial_state argument)
y = AttentionRNN(cell)(x, attended_sequence=z)

What do you think?

In terms of PR process, I suggest doing a single PR for the whole thing. That will make the purpose of the various pieces more apparent.

@andhus
Copy link
Contributor Author

andhus commented Oct 2, 2017

Sounds great! Basically, the additions to RNN in this PR are kept separate in a new subclass AttentionRNN. I suspect this will introduce code duplication but I agree this is motivated by avoiding adding any additional complexity to the RNN.

A few details:

  • The new input keyword argument should not contain sequence as it could be just anything, e.g. an image in an "mage captioning problem". What really defines this argument is that it will be presented in "whole" at each cell transformation. How about just attended?. It's required shape is only defined by cell implementation, for attention-cells that implement attended_spec the AttentionRNN can expose this (including the number of dims already before application).

  • A reminder (that is relevant for the naming discussion) is that there is no reason to limit ourselves to a single attended input (let's say you want to train a model to summarise a news article and attend both to the text and an image e.g.). I think the "plural" is covered by attended but we could also go for attended_inputs.

  • Just to be clear, we'll need to add the keyword arg to AttentionRNN.__call__ as well, as we need to override parent RNN.__call__ to add the attended inputs to input_spec before build.

So in summary:

y = AttentionRNN(attention_cell)(x, attended=z)

# alt. for many attended:

y = AttentionRNN(attention_cell)(x, attended=[z1, z2])

Where the attention_cell implements call(inputs, states, attended)

Totally agree about the PRs, Let's close second one to keep discussion in one place!

@fchollet
Copy link
Member

fchollet commented Oct 3, 2017

Cool, let's go with this then!

@andhus
Copy link
Contributor Author

andhus commented Oct 3, 2017

Great, I'll update this PR and the API doc this weekend.

@andhus andhus changed the title Recurrent Attention (Step 1): Support for passing external constants to RNN Recurrent Attention: AttentionRNN and FunctionalRNNCell Oct 8, 2017
@@ -194,6 +196,130 @@ def get_losses_for(self, inputs=None):
return losses


class FunctionalRNNCell(Wrapper):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is RNNCellModelbetter name?

Copy link
Contributor Author

@andhus andhus Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's not a good idea. It's not a Model from the aspect that it should not have compile, fit, predict etc... it's should just "package" the cell based on the functional definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably override __call__ as well.

@fchollet
Copy link
Member

fchollet commented Oct 9, 2017

This is a big PR. If any users of Attention or people familiar with Keras RNNs could help with reviewing, that would be greatly appreciated.

@andhus
Copy link
Contributor Author

andhus commented Oct 9, 2017

Yes, would be great with input from more contributors/users. Here is a guide to the additions:

New class FunctionalRNNCell
Adds support for composing RNN cells using keras functional API.

  • Compared to the general Model it automatically implements the property state_size as well as the correct signature of call (required for RNN cells) based on the arguments passed.
  • It also offers a better API for keeping track of what is input/output and what is input states/output states.
  • Supports an additional input attended for implementing attention mechanisms.

New class AttentionRNN(RNN)
There is quite a lot of code here but it really just does a few things:

  • Asserts that the passed cell supports attention (cell.call takes kwarg attended)
  • Takes the kwarg attended (tensor/list of tensors) in both __call__ and call and passes it on to the cell.
  • Temporarily modifies input_spec in __call__ so that the attendedis properly added to the computational graph.
  • If the wrapped cell is a keras Layer it passes the shape of the attended as part of input_shape to cell.build so that the cell can build the attention mechanism properly.

In the AttentionRNN there is quite some code duplication from RNN(as expected). This can be alleviated by breaking apart the __call__ and callmethod of RNN but based on our previous discussions I intentionally left RNN untouched for now.

There are also many more tests required, but I want some feedback/review before continuing.

@fchollet
Copy link
Member

fchollet commented Oct 9, 2017

New class AttentionRNN(RNN)

Could we move attention support to the base class RNN altogether? Conditional on whether the cell's call takes the attended argument.

Also: what would be some alternative names for attended? attended_sequence? I'm afraid attended wouldn't be sufficiently explicit.

@andhus
Copy link
Contributor Author

andhus commented Oct 10, 2017

Could we move attention support to the base class RNN altogether?

This is exactly what I implemented in a backward compatible way (and tested) in the original version of this PR (I then used the keyword argument constants): 28795f1

Also: what would be some alternative names for attended? attended_sequence?

As discussed above, "sequence" should not be in the name. Again, the attended can be anything (including multiple) tensors. A standard application is image captioning where the attended is an image (or typically CNN feature maps of an image).

I do still think it would be better to support passing, what I originally referred to i docs as, external constants in the base RNN, which forwards them to cell.call, as well as their shapes to cell.build if the cell is a keras layer. And yes, if such constants are passed in RNN.__call__ it is required that cell.call accepts them.

As per the current state of this PR, RNN is purely a special case of AttentionRNN (with attended=None) which is a bit strange. I agreed to your suggestion to put this logic in a new separate class with (as I wrote) the main motivation being "avoiding adding any additional complexity to the RNN" and perhaps to move forward a bit faster ;). There is value in minimising complexity of the base RNNclass, but the question is if this motivates a whole new class which is just a generalisation of the first.

Regarding naming: If we were to stick to an additional separate AttentionRNN, I think attended is the best name. As suggested previously, attended_inputs is also an alternative. However, if we move back support of "external constants" to RNN I think attended is a confusing and unnecessary limiting name. I can't come up with a better name than constants for this case, and make clear in documentation that "these are tensors that will be passed in whole to cell.call for each time step" as was done in original PR, see: 28795f1#diff-3118e4e28157032506f771f279a551c3R281

Generally, I prefer to consider RNN the component in the Keras Layer-abstraction level that corresponds to the Keras backend's K.rnn or e.g. theano.scan. There is a reason why both of these takes an argument for "external constants". In K.rnn the keyword constants is used and theano.scan uses non_sequences. In theano there are optimisation reasons for passing these (not utilised in keras backend!?) and it is not required to pass such constants to a "control flow op" if it suports "partialising" of the step function with preset tensors. We discussed this "partialisation" in our initial discussion and it is not an option on the Keras Layer-abstraction level as we need (want!) to be explicit about all inputs and outputs of layers.

The theano "non_sequences" is not a very user-friendly name as these "external constants" could be sequences but not iterated over by the RNN cell. The reason I don't want to go for attended as keyword in RNN is that they could be used for other things (you could e.g. write cell wrappers that implements recurrent dropout for any cell based on these constants...)

...I apologise for these "essay" comments. This is an intricate matter. I've really tried to do a thorough requirements analysis with respect to both how Attention is used in research and the current state of Keras (which e.g. lead to my conclusion that the RNNCell should really be broken out before this addition). Perhaps it was too early to move to PRs but on the other hand it is the only way to really show the implications and feasibility of the suggestion. It is quite slow to work with the sparse, and occasionally inconsistent ;), feedback. I'd be happy to set up a Hangouts session to discuss details and options directly!

@andhus
Copy link
Contributor Author

andhus commented Oct 10, 2017

Summary of the very long comment above:

  • I think best option is to add support for (non iterated over) constants in RNN as per the original solution in this PR: 28795f1 This way we keep down the number of classes, and for implementing attention we just need to wrap/implement the RNN cell.
  • In this case, I can't come up with a better name then constants. It can be made clear from documentation that these are tensors passed "in whole" to cell.call, i.e. they are "not iterated over". I'd prefer to use such a general name because it could be used for other things than "attention" - however if we think it will only be used for attention we could name the argument attended or attended_inputs.
  • It would be great with feedback from more contributors. If we want to speed up the process I'd be happy to set up a hangouts session for discussion. I've already explored the pros and cons of many alternatives but its challenging to present the full analysis in comments here.

@fchollet
Copy link
Member

I think best option is to add support for (non iterated over) constants in RNN as per the original solution in this PR: 28795f1 This way we keep down the number of classes, and for implementing attention we just need to wrap/implement the RNN cell.

I've been looking at these PRs for some time and I'm still not sure. It would definitely be helpful to have more people chime in with their perspective.

I think you're probably right, it would be good to add constants to the RNN API. One thing to consider though, is that keras.backend.rnn already has support for constants. It would be good not to reimplement an independent, layer-level mechanism, and instead reuse what we have...

If we do this, the API changes would be:

  • constants=None in RNN's call. Ideally this would forward the constants to K.rnn (they are then appended to the first argument passed to the step function).
  • SequenceAttention wrapper class for RNN cells? (please confirm)

In this case, I can't come up with a better name then constants

constants is fine.

@fchollet
Copy link
Member

It would be good not to reimplement an independent, layer-level mechanism, and instead reuse what we have...

Besides just the semantic problem of having multiple similar but independent mechanisms with the same name, one thing to consider is performance. Different backends might want to optimize the way they manage constants. Theano does it to some extent, for instance.

@fchollet
Copy link
Member

fchollet commented Oct 13, 2017

Let's see... how bad would it be to have something like:

class RNN(...):

def __call__(self, inputs, initial_state=None, constants=None, **kwargs):
       # handle connectivity of `constants`, like we do for `initial_state`
       ...

def call(self, inputs, mask=None, training=None, initial_state=None, constants=None):
        ...
        if constants:
            if not has_arg(self.cell.call, 'constants'):
                 raise ValueError(...)
            def step(inputs, states):
                 constants = inputs[1:]
                 if has_arg(self.cell.call, 'training'):
                     return self.cell.call(inputs[0], constants=constants, training=training)
                 else:
                     return self.cell.call(inputs[0], constants=constants)
           
        last_output, outputs, states = K.rnn(step,
                                             inputs,
                                             initial_state,
                                             go_backwards=self.go_backwards,
                                             mask=mask,
                                             constants=constants,
                                             unroll=self.unroll,
                                             input_length=timesteps)

@andhus
Copy link
Contributor Author

andhus commented Oct 14, 2017

Good, at this point it seems obvious that we should add constants support to layer.RNN. It will be backward compatible and "non-advanced" users don't have to even now that it's there.

It would be good not to reimplement an independent, layer-level mechanism, and instead reuse what we have...

I'm not sure exactly what you mean by this, but yes it's important that Layer level API is consistent (in terminology and functionality) with the backend level, to the extent possible/meaningful. That is a good motivation to stick to the same name and behaviour for constants.

Besides just the semantic problem of having multiple similar but independent mechanisms with the same name [...]

I don't think there is anything strange about having a layer RNN that to a great extent "duplicates" functionality of backend rnn. I think it is clear what the layer level "brings to the table" (initialisation, building, holding weights...). Backend rnn is the purer (subset) of RNN functionality that is backend dependent.

[...] one thing to consider is performance. Different backends might want to optimize the way they manage constants. Theano does it to some extent, for instance.

Yes, this is important and quite complex. I wrote about it in my previous comments and put a note on it in the PR (https://github.com/fchollet/keras/pull/7980/files#diff-3118e4e28157032506f771f279a551c3R2340). I agree that its better to pass constants to backend rnn and let it explicitly pass them to the step_function (rather then partialising step_function with them), but regarding optimisation I'm not sure that is "enough" or makes a big difference anyway. For Theano we should actually pass all involved tensors in the step transformation, including the weights, as non_sequences to theano.scan. This is not done/supported with current API. I haven't profiled how big difference it makes for typical applications.

Let's see... how bad would it be to have something like: [...]

This is not too bad - it's one way of making sure that the constants are passed via backend rnn. There is a bug though: you assume constants are appended to inputs inside rnn but in fact they are appended to states (see e.g.: https://github.com/fchollet/keras/blob/master/keras/backend/tensorflow_backend.py#L2479). Note that this appending of constants is not documented in any of the backends' rnn. CNTK has no docs, Theano and Tensorflow backends don't mention what happens with constants and are in fact wrong when stating:

new_states: list of tensors, same length and shapes as 'states'

Where 'states' refers to the argument of the step_function. This is not true if constants are passed to rnn as if so: len(new_states) < len(states).

I think this mistake is symptomatic of one of the issues with the Keras recurrent code base - that arguments are not always what they "claim to be" - and will be different things at different times: "states can be either states - or states and constants" and "inputs can be either inputs - or inputs and initial_state" (in RNN.__call__) This causes mental overhead, boilerplate for splitting lists of inputs and decreased readability.

I'd recommend working towards removing this pattern. We should try to make step_functionin backend rnn reflect the RNNCell in RNN and I think there should be a separate argument for constants (not append them to states as is done now, or inputs as would be required in your code example) - this goes for the step_function as well as cell.call. It is more clear and easier to extend this way. This is easy to change but would not be backward compatible in terms of backend rnn's expectations on step_function (although, as mentioned above, its not part of API documentation how constants are passed to step_function). What's your thoughts on this, could we change signature to step_function(inputs, states, constants=None) and make it mirror the RNNCell?

The other example, that inputs to RNN can be either just the input tensor or a list of the input and initial_state, gets more complicated if we add yet another keyword argument constants to RNN.__call__ that could also be part of inputs. If inputsis a list of say two tensors, how can we know if the second one is intitial_stateor constants (especially if state_spec and constant_spechappen to be the same)? Is this really required, is it not possible to allow passing initial_state (and later constants) only by keyword arg? I get that is has to do with composing (and serialisation?) of models, I haven't looked into the details but there should be a better solution...

@andhus
Copy link
Contributor Author

andhus commented Oct 15, 2017

I think it is still better to accept this inconsistency, rather than enforcing the "unmotivated appending of constants to states" in RNNCell.call

Please trust me on this one; it will destroy readability of Attention Mechanisms and cause a lot of implementation overhead when trying to make them modular if we start mixing states and constants in the same list.

@fchollet
Copy link
Member

Yes, passing all input tensors as a flat list is part of the internal API. On the user-facing side of things, we can always avoid doing that (and instead pass the additional arguments as keyword arguments), but it is unavoidable at model deserialization time.

I agree that it's difficult to figure out a reasonable way to parse the list of inputs. Since the problem only occurs at deserialization time, it should be possible to store in the layer config whether or not initial states and constants are "keras tensors" (i.e. whether they are part of the model topology), which would then provide all the information needed for deserialization.

Keep in mind that this is an implementation detail. The public API is still to use keyword arguments.

@fchollet
Copy link
Member

Please trust me on this one; it will destroy readability of Attention Mechanisms

It is purely an implementation detail and will not modify the signatures of the call methods of either batch-level layers or cells. Hence no readability problem. We should do it, both for performance reasons and conceptual simplicity. See the earlier code snippet for an implementation suggestion.

@andhus
Copy link
Contributor Author

andhus commented Oct 17, 2017

The public API is still to use keyword arguments

Ok great, (1.) is not a problem then. As you suggest, RNN can store what's expected (in e.g. constants_spec), then the splitting of internal flat-list-format will be quite trivial. And yes, this has to be represented in the config (in a backward compatible way).

I'm not sure I understood the answer to (2.), is my understanding correct that we should not change expected signature of (backend rnn argument) step_function? But instead just wrap cell.call inside RNN?

@andhus
Copy link
Contributor Author

andhus commented Oct 17, 2017

It is purely an implementation detail [...] hence no readability problem.

I totally appreciate Keras extreme focus on the public API. But for the library (and API) to stay relevant, it has to continue to develop. The structure and readability of its internals are crucial for easy extension and for people to continue to contribute. Many improvements could be done that are in no conflict with public API. ...The quote above sounds a bit scary ;)

@fchollet
Copy link
Member

I'm not sure I understood the answer to (2.), is my understanding correct that we should not change expected signature of (backend rnn argument) step_function? But instead just wrap cell.call inside RNN?

Yes, indeed, we can simply wrap the calls to the step function, as in the code snippet.

Your other comment: my quote on readability means to say that since method signatures don't change, the implementation of call is strictly the same. Hence is it not less readable: it is identical.

@andhus
Copy link
Contributor Author

andhus commented Oct 21, 2017

Ok, so I've updated this PR back to "supporting of constants in RNN" according our discussion.

For overview, there is a functioning implementation (based on the additions in this PR) of next steps:

  • base class for attention cell wrappers
  • mixture of gaussian 1D/sequence attention (including example)

here: andhus#3. I didn't want to pull it in to this PR until we've settled on the changes in RNN.

There is also an explanation of the example in the linked PR (and a benchmark MoGAttention vs. regular seq-to-seq) here: https://github.com/andhus/extkeras/tree/master/examples/attention

for dim in self.cell.state_size]
else:
self.state_spec = InputSpec(shape=(None, self.cell.state_size))
self.state_spec = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more advanced cells (typically attention cells such as this one: https://github.com/andhus/keras/pull/3/files#diff-9c4188ddc4dd173d80f64feed5b89412R258) the cell.state_sizeis not defined until the cell is built. But that's not a problem, state_specis just set a bit later in build (or in __call__ if initial_stateis passed to RNN)

@andhus andhus changed the title Recurrent Attention: AttentionRNN and FunctionalRNNCell Recurrent Attention API Additions Oct 22, 2017
initial_state = inputs[1:]
inputs = inputs[0]
def __call__(self, inputs, initial_state=None, constants=None, **kwargs):
inputs, initial_state, constants = self._normalize_args(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer "standardize" (also used elsewhere) since "normalize" has a specific meaning elsewhere.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good to me! Comments are mostly nitpicking points.

check_list += constants
self.constants_spec = [InputSpec(shape=K.int_shape(constant))
for constant in constants]
self._n_constants = len(constants)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line with naming conventions in this API, this should be _num_constants.


is_keras_tensor = hasattr(initial_state[0], '_keras_history')
for tensor in initial_state:
check_list = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear what is meant by check_list. Maybe input_list would be better?

raise ValueError('The initial state of an RNN layer cannot be'
' specified with a mix of Keras tensors and'
' non-Keras tensors')
raise ValueError('The initial state and constants of an RNN'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and -> or

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think I wanted to refer to that they "jointly" must contain only one kind of tensors (that's what checked for now)... I don't have any opinion on the error msg, but just to verify/understand: shouldn't all inputs be keras tensors or not - and why not include inputs then as well in the check (not only constants and initial_state)?

@@ -560,6 +614,48 @@ def call(self, inputs, mask=None, training=None, initial_state=None):
else:
return output

def _normalize_args(self, inputs, initial_state, constants):
"""When running a model loaded from file, the input tensors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: first line of docstring (one-line summary) should fit in one line and end with a period.

@@ -560,6 +614,48 @@ def call(self, inputs, mask=None, training=None, initial_state=None):
else:
return output

def _normalize_args(self, inputs, initial_state, constants):
"""When running a model loaded from file, the input tensors
`initial_state` and `constants` can be passed to RNN.__call__ as part
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

` around code keywords

constants.

This method makes sure initial_states and constants are separated from
inputs and that the are lists of tensors (or None).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the -> they

initial_state = inputs[1:]
inputs = inputs[0]

def to_list_or_none(x): # TODO break out?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you want to use it elsewhere, it is fine as it is, you can remove this comment

@@ -618,6 +714,9 @@ def get_config(self):
'go_backwards': self.go_backwards,
'stateful': self.stateful,
'unroll': self.unroll}
if self._n_constants is not None:
config['_n_constants'] = self._n_constants
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please call it num_constants is the config.

@@ -292,6 +293,14 @@ class RNN(Layer):
`states` should be a numpy array or list of numpy arrays representing
the initial state of the RNN layer.

# Note on passing external constants to RNNs
You can pass "external" constants to the cell using the `constants`
keyword argument of RNN.__call__ (as well as RNN.call) method. This
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

` around code keywords

@@ -415,19 +422,46 @@ def compute_mask(self, inputs, mask):
return output_mask

def build(self, input_shape):
# Note input_shape will be list of shapes of initial states and
Copy link
Contributor Author

@andhus andhus Oct 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note input_shape will be list of shapes of input followed by shapes of initial states and constants if the later as passed in __call__

@andhus
Copy link
Contributor Author

andhus commented Oct 24, 2017

I fixed the comments and added a test case for when both initial_stateand constants are passed and tests now also covers "flat list" inputs after serialisation.

What do you think: should we aim to merge this first separately (a pretty well defined feature addition) or continue to pull in next steps (actual attention mechanisms) before merging anything?

@fchollet
Copy link
Member

What do you think: should we aim to merge this first separately (a pretty well defined feature addition)

We'll merge it first.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@fchollet fchollet merged commit 3f148e4 into keras-team:master Oct 25, 2017
@andhus
Copy link
Contributor Author

andhus commented Oct 25, 2017

@fchollet saw that build is failing on master: https://travis-ci.org/fchollet/keras/jobs/292414692 test_plot_model-> "ImportError: Failed to import pydot..."

@dunnock
Copy link

dunnock commented Oct 25, 2017

@andhus it seems after this merged into the master CuDNNLSTM has broke. After pulling I get error:

  File "C:\ProgramData\Anaconda3\lib\site-packages\keras-2.0.8-py3.6.egg\keras\layers\recurrent.py", line 482, in __call__
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras-2.0.8-py3.6.egg\keras\engine\topology.py", line 575, in __call__
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras-2.0.8-py3.6.egg\keras\layers\cudnn_recurrent.py", line 400, in build
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras-2.0.8-py3.6.egg\keras\layers\recurrent.py", line 427, in build
AttributeError: 'CuDNNLSTM' object has no attribute '_num_constants'

I have latest nightly tensorflow

@fchollet
Copy link
Member

Unfortunately the cuDNN RNN tests cannot be run on Travis since they require a GPU. We need to remember to run them manually everytime we make changes to recurrent.py.

I will look at the issue.

@andhus
Copy link
Contributor Author

andhus commented Oct 27, 2017

Yeah, thanks for taking care of this, did not think about that addition!

@andhus andhus deleted the recurrent_attention_api_constants_support branch October 28, 2017 14:14
@andhus andhus changed the title Recurrent Attention API Additions Recurrent Attention API: Support constants in RNN Oct 29, 2017
@andhus
Copy link
Contributor Author

andhus commented Oct 29, 2017

Continuation of Recurrent Attention API here: #8296

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants