Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of DropConnect #1332

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Initial implementation of DropConnect #1332

wants to merge 1 commit into from

Conversation

drnemor
Copy link

@drnemor drnemor commented Mar 29, 2018

I'm interested in reimplementation of paper "Regularizing and Optimizing LSTM Language Models" but I'm failed to implement DropConnect regularization for LSTM with current Python API. So I made an extension for VanillaLSTMBuilder to support DropConnect regularizer for recurrent matrix. I used dropout implementation as inspiration, so I not fully confident that my implementation is efficient and truly correct, but testing on lstmlm-auto script indicates that network is able to learn.
p.s. Sorry for English

@neubig
Copy link
Contributor

neubig commented Apr 20, 2018

Thanks a lot for this! And sorry about the late reply. The reason why I haven't merged this yet is because:

  1. I'd like to check it closely.
  2. I'm thinking about how we should handle the "Builders" in DyNet. There are a million variants of LSTMs, and it's not clear to me the best way to handle them in the current framework, as it's not realistic to implement all of them within the builders themselves. Ideally it'd be nice to give users more flexibility to do this themselves. Of course you can always re-implement LSTMs in Python (it wouldn't be much slower), but it'd be nice to provide some more flexible support on the library side.

@drnemor
Copy link
Author

drnemor commented Jun 18, 2018

@neubig Sorry for a late reply. Maybe builders can be turned into high-order functions like "fold" with a specific interface where user provided with all necessary values? Something like rnn = RnnBuilder(lambda x_t, h_tm1, W_ih, W_hh: tanh(W_ih * x_t + W_hh * h_tm1)). This allows a user to implement a custom logic for particular timestep while iteration logic stays hidden from him.
We can go even further and allow to describe set of parameters and pass them to the callback as a structure like rnn = RnnBuilder(params=[('W_hh', (dim1, dim2)), ('W_ih', (dim1, dim2))], step=lambda x, params, state: state.W_ih * x_t + state.W_hh * state)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants