Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

which attention architecture is used in NER? #6

Open
omerarshad opened this issue Oct 2, 2018 · 7 comments
Open

which attention architecture is used in NER? #6

omerarshad opened this issue Oct 2, 2018 · 7 comments

Comments

@omerarshad
Copy link

I want to understand how you used attention in NEr task, any paper or article which explains this? Thanks

@qq547276542
Copy link

According to the README, attention mechanism not suitable for NER task:
The variant modules include Stack Bidirectional RNN (multi-layers), Multi-RNN Cells (multi-layers), Lurong/Bahdanau Attention Mechanism, Self-attention Mechanism, Residual Connection, Layer Normalization and so on. However, these modifications did not improve (sometime even worse than basement model) the performance significantly (F1 score improves >= 1.5). It's easy to apply those variants and train by modifying the config settings in train_conll_ner_blstm_cnn_crf.py.

@qq547276542
Copy link

In my own experiment, attention mechanism really not work, but Layer Normalization can improve the robustness of the model

@omerarshad
Copy link
Author

well in my experiments attention only models achieve comparable results to LSTM, even got better than LSTM with very less training time

@qq547276542
Copy link

well in my experiments attention only models achieve comparable results to LSTM, even got better than LSTM with very less training time

Maybe there are some problems with my experiment. I have tried BLSTM+SelfAttention+CRF,the effect is not as good as BLSTM+CRF. The structure of your model is SelfAttention+CRF, not LSTM? I want to give it a try.

@omerarshad
Copy link
Author

yes, structure of my model is attention+crf only

@qq547276542
Copy link

yes, structure of my model is attention+crf only

do you have the relevant code, can I refer to it?

@VioletJKI
Copy link

I do not use crf, and get a best result.

@MingLunHan excuse me, what is your model's architecture? bilstm + attention only?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants