forked from dmlc/gluon-nlp
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* update model description with reference to paper, add table and numbers to clarity * update table * update table * update rst
- Loading branch information
Showing
2 changed files
with
51 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,34 +1,73 @@ | ||
Word Language Model | ||
------------------- | ||
|
||
This script can be used to train language models with the given specification. | ||
|
||
Use the following command to run the AWDRNN language model setting (emsize=400, nhid=1,150) | ||
Merity, S., et al. "`Regularizing and optimizing LSTM language models <https://openreview.net/pdf?id=SyyGPP0TZ>`_". ICLR 2018 | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
All the language models are trained with this script: :download:`this example script <language_model/word_language_model.py>`. | ||
The key features used to reproduce the results for pre-trained models are listed in the following tables. | ||
|
||
.. editting URL for the following table: https://bit.ly/2HnC2cn | ||
The dataset used for training the models is wikitext-2. | ||
|
||
|
||
+--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+ | ||
| Model | awd_lstm_lm_1150_wikitext-2 | awd_lstm_lm_600_wikitext-2 | standard_lstm_lm_1500_wikitext-2 | standard_lstm_lm_650_wikitext-2 | standard_lstm_lm_200_wikitext-2 | | ||
+==============+=================================+================================+======================================+=====================================+=====================================+ | ||
| Mode | LSTM | LSTM | LSTM | LSTM | LSTM | | ||
+--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+ | ||
| Num_layers | 3 | 3 | 2 | 2 | 2 | | ||
+--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+ | ||
| Embed size | 400 | 200 | 1500 | 650 | 200 | | ||
+--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+ | ||
| Hidden size | 1150 | 600 | 1500 | 650 | 200 | | ||
+--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+ | ||
| Dropout | 0.4 | 0.2 | 0.65 | 0.5 | 0.2 | | ||
+--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+ | ||
| Dropout_h | 0.2 | 0.1 | 0 | 0 | 0 | | ||
+--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+ | ||
| Dropout_i | 0.65 | 0.3 | 0 | 0 | 0 | | ||
+--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+ | ||
| Dropout_e | 0.1 | 0.05 | 0 | 0 | 0 | | ||
+--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+ | ||
| Weight_drop | 0.5 | 0.2 | 0 | 0 | 0 | | ||
+--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+ | ||
| Tied | True | True | True | True | True | | ||
+--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+ | ||
| Val PPL | 73.32 | 84.61 | 98.29 | 98.96 | 108.25 | | ||
+--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+ | ||
| Test PPL | 69.74 | 80.96 | 92.83 | 93.90 | 102.26 | | ||
+--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+ | ||
| Command | [1] | [2] | [3] | [4] | [5] | | ||
+--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+ | ||
|
||
[1] awd_lstm_lm_1150_wikitext-2 (Val PPL 73.32 Test PPL 69.74) | ||
|
||
.. code-block:: bash | ||
$ python word_language_model.py --gpus 0 --tied --save awd_lstm_lm_1150_wikitext-2 # Val PPL 73.32 Test PPL 69.74 | ||
$ python word_language_model.py --gpus 0 --tied --save awd_lstm_lm_1150_wikitext-2 | ||
Use the following command to run the AWDRNN language model setting (emsize=200, nhid=600) | ||
[2] awd_lstm_lm_600_wikitext-2 (Val PPL 84.61 Test PPL 80.96) | ||
|
||
.. code-block:: bash | ||
$ python word_language_model.py -gpus 0 --dropout 0.2 --dropout_h 0.1 --dropout_i 0.3 --dropout_e 0.05 --weight_drop 0.2 --tied --save awd_lstm_lm_600_wikitext-2 # Val PPL 84.61 Test PPL 80.96 | ||
$ python word_language_model.py -gpus 0 --emsize 200 --nhid 600 --dropout 0.2 --dropout_h 0.1 --dropout_i 0.3 --dropout_e 0.05 --weight_drop 0.2 --tied --save awd_lstm_lm_600_wikitext-2 | ||
Use the following command to run the StandardRNN language model setting (emsize=1,500, nhid=1,500) | ||
[3] standard_lstm_lm_1500_wikitext-2 (Val PPL 98.29 Test PPL 92.83) | ||
|
||
.. code-block:: bash | ||
$ python word_language_model.py --gpus 0 --emsize 1500 --nhid 1500 --nlayers 2 --lr 20 --epochs 750 --batch_size 20 --bptt 35 --dropout 0.65 --dropout_h 0 --dropout_i 0 --dropout_e 0 --weight_drop 0 --tied --wd 0 --alpha 0 --beta 0 --save standard_lstm_lm_1500_wikitext-2 # Val PPL 98.29 Test PPL 92.83 | ||
$ python word_language_model.py --gpus 0 --emsize 1500 --nhid 1500 --nlayers 2 --lr 20 --epochs 750 --batch_size 20 --bptt 35 --dropout 0.65 --dropout_h 0 --dropout_i 0 --dropout_e 0 --weight_drop 0 --tied --wd 0 --alpha 0 --beta 0 --save standard_lstm_lm_1500_wikitext-2 | ||
Use the following command to run the StandardRNN language model setting (emsize=650, nhid=650) | ||
[4] standard_lstm_lm_650_wikitext-2 (Val PPL 98.96 Test PPL 93.90) | ||
|
||
.. code-block:: bash | ||
$ python word_language_model.py --gpus 0 --emsize 650 --nhid 650 --nlayers 2 --lr 20 --epochs 750 --batch_size 20 --bptt 35 --dropout 0.5 --dropout_h 0 --dropout_i 0 --dropout_e 0 --weight_drop 0 --tied --wd 0 --alpha 0 --beta 0 --save standard_lstm_lm_650_wikitext-2 # Val PPL 98.96 Test PPL 93.90 | ||
$ python word_language_model.py --gpus 0 --emsize 650 --nhid 650 --nlayers 2 --lr 20 --epochs 750 --batch_size 20 --bptt 35 --dropout 0.5 --dropout_h 0 --dropout_i 0 --dropout_e 0 --weight_drop 0 --tied --wd 0 --alpha 0 --beta 0 --save standard_lstm_lm_650_wikitext-2 | ||
Use the following command to run the StandardRNN language model setting (emsize=200, nhid=200) | ||
[5] standard_lstm_lm_200_wikitext-2 (Val PPL 108.25 Test PPL 102.26) | ||
|
||
.. code-block:: bash | ||
$ python word_language_model.py --gpus 0 --emsize 200 --nhid 200 --nlayers 2 --lr 20 --epochs 750 --batch_size 20 --bptt 35 --dropout 0.2 --dropout_h 0 --dropout_i 0 --dropout_e 0 --weight_drop 0 --tied --wd 0 --alpha 0 --beta 0 --save standard_lstm_lm_200_wikitext # Val PPL 108.25 Test PPL 102.26 | ||
$ python word_language_model.py --gpus 0 --emsize 200 --nhid 200 --nlayers 2 --lr 20 --epochs 750 --batch_size 20 --bptt 35 --dropout 0.2 --dropout_h 0 --dropout_i 0 --dropout_e 0 --weight_drop 0 --tied --wd 0 --alpha 0 --beta 0 --save standard_lstm_lm_200_wikitext-2 |