Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[OP] broadcast_mask for rnn #3016

Merged
merged 1 commit into from
Aug 15, 2016
Merged

[OP] broadcast_mask for rnn #3016

merged 1 commit into from
Aug 15, 2016

Conversation

antinucleon
Copy link
Contributor

No description provided.

@antinucleon antinucleon changed the title broadcast_mask for rnn [OP] broadcast_mask for rnn Aug 13, 2016
@antinucleon
Copy link
Contributor Author

@pluskid @freddycct

New LSTM unit example:

def lstm(num_hidden, indata, mask, prev_state, param, seqidx, layeridx, dropout=0.):
    """LSTM Unit symbol"""
    i2h = mx.sym.FullyConnected(data=indata,
                                weight=param.i2h_weight,
                                bias=param.i2h_bias,
                                num_hidden=num_hidden * 4,
                                name="t%d_l%d_i2h" % (seqidx, layeridx))
    h2h = mx.sym.FullyConnected(data=prev_state.h,
                                weight=param.h2h_weight,
                                bias=param.h2h_bias,
                                num_hidden=num_hidden * 4,
                                name="t%d_l%d_h2h" % (seqidx, layeridx))
    gates = i2h + h2h
    slice_gates = mx.sym.SliceChannel(gates, num_outputs=4,
                                      name="t%d_l%d_slice" % (seqidx, layeridx))
    in_gate = mx.sym.Activation(slice_gates[0], act_type="sigmoid")
    in_transform = mx.sym.Activation(slice_gates[1], act_type="tanh")
    forget_gate = mx.sym.Activation(slice_gates[2], act_type="sigmoid")
    out_gate = mx.sym.Activation(slice_gates[3], act_type="sigmoid")
    next_c = (forget_gate * prev_state.c) + (in_gate * in_transform)
    next_h = out_gate * mx.sym.Activation(next_c, act_type="tanh")
    # dropout the hidden h
    next_h = mx.sym.Dropout(next_h, p=dropout)
    # mask out the output
    next_c = mx.sym.element_mask(next_c, mask)
    next_h = mx.sym.element_mask(next_h, mask)
    return LSTMState(c=next_c, h=next_h)

Do you think it is enough?

@freddycct
Copy link
Contributor

@antinucleon cool, you fixed it fast, I just pulled this changes. I will check it soon and let you know...
Thanks!

@freddycct
Copy link
Contributor

@antinucleon I am trying out today, and possibly tomorrow too. How do I pass in the mask variables, do I feed it from my DataIter?

@antinucleon
Copy link
Contributor Author

Yes
On Sun, Aug 14, 2016 at 12:49 Freddy Chua notifications@github.com wrote:

@antinucleon https://github.com/antinucleon I am trying out today, and
possibly tomorrow too. How do I pass in the mask variables, do I feed it
from my DataIter?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3016 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABM13vW95jv9oT2lSGz5HpgMbKGx7CJ2ks5qf3FXgaJpZM4Jjj4e
.

Sent from mobile phone

@freddycct
Copy link
Contributor

@antinucleon I am not sure how to feed from dataiter, is it through provide_data or provide_label? Because the errors I am getting is not very informative...

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-41-47e7eca4e178> in <module>()
      3     eval_metric = mx.metric.np(perplexity, use_ignore=True, ignore_label=num_labels),
      4     batch_end_callback = [ mx.callback.Speedometer(batch_size, frequent=10) ],
----> 5     epoch_end_callback = [ mx.callback.do_checkpoint( '%s/%s' % (params_dir, expt_name) ) ]
      6 )

C:\Users\chuaf\AppData\Local\Continuum\Miniconda3\lib\site-packages\mxnet-0.7.0-py3.5.egg\mxnet\model.py in fit(self, X, y, eval_data, eval_metric, epoch_end_callback, batch_end_callback, kvstore, logger, work_load_list, monitor, eval_batch_end_callback)
    744 
    745         arg_names, param_names, aux_names = \
--> 746                 self._init_params(dict(data.provide_data+data.provide_label))
    747 
    748         # setup metric

C:\Users\chuaf\AppData\Local\Continuum\Miniconda3\lib\site-packages\mxnet-0.7.0-py3.5.egg\mxnet\model.py in _init_params(self, input_shapes, overwrite)
    485         """Initialize weight parameters and auxiliary states"""
    486         arg_shapes, _, aux_shapes = self.symbol.infer_shape(**input_shapes)
--> 487         assert(arg_shapes is not None)
    488 
    489         arg_names = self.symbol.list_arguments()

AssertionError: 

@freddycct
Copy link
Contributor

This is how I use your lstm in my lstm_unroll

    data  = mx.sym.Variable('data')
    label = mx.sym.Variable('label')
    mask  = mx.sym.Variable('mask')

    # (batch, time, vec) so axis 1 is the time step

    embed = mx.sym.Embedding(
        data=data, input_dim=num_labels,
        weight=embed_weight, output_dim=num_hidden, name='embed'
    )
    wordvec = mx.sym.SliceChannel(data=embed, num_outputs=enc_len + dec_len, squeeze_axis=1)
    masks   = mx.sym.SliceChannel(data=mask,  num_outputs=enc_len + dec_len, squeeze_axis=1)

    hidden_all = []
    for seqidx in range(enc_len + dec_len):
        hidden = wordvec[seqidx]
        mask_in = masks[seqidx]

        # stack LSTM
        for i in range(num_lstm_layer):
            dp = 0.0 if i == 0 else dropout

            # encoder RNN
            next_state = lstm(
                num_hidden,
                indata     = hidden,
                mask       = mask_in,
                prev_state = last_states[i],
                param      = enc_param_cells[i] if seqidx < enc_len else dec_param_cells[i],
                seqidx     = seqidx,
                layeridx   = i,
                dropout    = dp
            )

@antinucleon
Copy link
Contributor Author

I am writing a data iter sample now, please wait a moment...

@antinucleon
Copy link
Contributor Author

@freddycct

Here is a shitty working example without raising error. But need more work to polish it and verify correctness.

https://gist.github.com/antinucleon/c4ff26032f3a97f6aaf89680dfabe291

@freddycct
Copy link
Contributor

@antinucleon Thanks, I will take a look tomorrow...

@antinucleon antinucleon merged commit d08d87f into apache:master Aug 15, 2016
@freddycct
Copy link
Contributor

basically, mask is a vector similar in length and shape to data, but it uses 0 for padding and 1 for non-padding inputs, and it is passed through the dataiter using provide_data,

i am currently modifying my dataiter for this, hang on..

@pluskid
Copy link
Contributor

pluskid commented Aug 16, 2016

Great! This is fast!

@antinucleon
Copy link
Contributor Author

@freddycct I find the bug in the shitty prototype iterator. Please let me know your experiment result.

@freddycct
Copy link
Contributor

freddycct commented Aug 16, 2016

@antinucleon Ok, the masking layer works fine for me after calling model.fit, I also noticed that the Embedding vectors representing the PAD symbol is not changing, that means the gradient is not backpropagated to the embedding layer, and that is great,

next steps, I need to test the RNN inference part, so hang on, i am continuing to test... but I hope this gives you some confidence that your code is working..

@freddycct
Copy link
Contributor

@antinucleon It works. My sequence to sequence is working, thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants