Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNN.call should get initial state from full input spec #10845

Merged
merged 2 commits into from
Aug 5, 2018

Conversation

yanboliang
Copy link
Contributor

@yanboliang yanboliang commented Aug 4, 2018

Summary

This PR fix a critical bug in RNN which reported at #9449 and #10830 .

In RNN.call, if initial_state is a tensor that was returned by a Keras layer, we should get initial_state from full input spec(including training data, state and constants) which was generated at RNN.__call__, as it could be copied to multiple GPUs. Otherwise, it would use the original initial_states which is not be sliced according the number of GPUs.
BTW, I have also check CuDNNRNN, it use the correct way, so we don't need to modify it.

I run the following test code in a machine with 2 GPUs, it works well after this fix.

import numpy as np
import keras
from keras import layers as L
from keras.models import Sequential, Model
from keras.utils.multi_gpu_utils import multi_gpu_model

x = L.Input((4,3))
init_state = L.Input((3,))
y = L.SimpleRNN(3,return_sequences=True)(x,initial_state=init_state)
_x = [np.random.randn(2,4,3),np.random.randn(2,3)]
_y = np.random.randn(2,4,3)
m = Model([x,init_state],y)
m2 = multi_gpu_model(m,2)
m2.compile(loss='mean_squared_error',optimizer='adam')
m2.train_on_batch(_x,_y)

Related Issues

#9449
#10830

PR Overview

  • This PR requires new unit tests [y/n] (make sure tests are included)
  • This PR requires to update the documentation [y/n] (make sure the docs are up-to-date)
  • This PR is backwards compatible [y/n]
  • This PR changes the current API [y/n] (all API changes need to be approved by fchollet)

Copy link
Collaborator

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@mikejhuang
Copy link

mikejhuang commented Nov 27, 2018

This bugfix doesn't work with ConvLSTMs with initialized states. I think it's because ConvLSTM initializes states differently than other LSTMs. Instead of taking two inputs, (input, init_state=init_state),
ConvLSTM takes in one single list containing the initial states ([input, init_state_h, init_state_c])

@JunHyungYu
Copy link

hi @mikejhuang i have same issue with you in convLSTM....

how did you solve your problem ??
if you know about it , please tell me ... i can't train my seq2seq convLSTM model with initial_state...
somebody help me~~

@mikejhuang
Copy link

@JunHyungYu
I used this bugfix
pascalxia@6750e1e

To get it to work, you input a list containing the initial states ([input, init_state_h, init_state_c])

@JunHyungYu
Copy link

JunHyungYu commented Apr 30, 2019

@mikejhuang
if i use convlstm()([input, init_state_h, init_state_c]) , i can make multi gpu model..

but when i train this multi gpu model, i have same error ..

InvalidArgumentError: Incompatible shapes: [8,32,224,224] vs. [16,32,224,224]
[[Node: replica_0_6/model_13/conv_lst_m2d_22/while/add_7 = Add[T=DT_FLOAT, _class=["loc:@train.../Reshape_1"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](replica_0_6/model_13/conv_lst_m2d_22/while/BiasAdd_3, replica_0_6/model_13/conv_lst_m2d_22/while/convolution_7)]]
[[Node: training_7/Adam/gradients/replica_0_6/model_13/conv_lst_m2d_22/while/add_7_grad/Shape_1/_2523 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2234_...ad/Shape_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

can you train your convlstm with initial_state using multi gpu model????

@mikejhuang
Copy link

mikejhuang commented May 1, 2019 via email

@JunHyungYu
Copy link

@mikejhuang Oh my god.... thank you..

@dzhv
Copy link

dzhv commented Jun 4, 2019

@mikejhuang, @JunHyungYu have any of you managed to find a workaround for the issue? I'm struggling with it atm :(

@maym2104
Copy link

Same issue with ConvLSTM2D w/ latest of Keras (2.2.4) and TensorFlow 1.13

@maym2104
Copy link

maym2104 commented Aug 17, 2019

It only happens when I use this combination:

  • ConvLSTM
  • w/ dropout and/or recurrent_dropout arg set
  • in a multi-gpu model.

It is working when I remove the dropout on this layer, but it results in a less efficient learning :/

EDIT:
I looked at the code because it was not clear to me where the dropout was applied. It would seem it is applied before any multiplication with the parameters is done. So a dropout layer before the ConvLSTM -- or LSTM, or RNN -- would be equivalent to the dropout argument (but not the recurrent one, since it is applied on the state). (In fact I already had one so it was redundant). The only difference in the case of LSTM and ConvLSTM is that a different mask is applied for each of the gate.
Replacing recurrent dropout is less trivial though, and it makes a huge difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants