Update model description (dmlc#55)

* update model description with reference to paper, add table and numbers to clarity * update table * update table * update rst
zhiheng-huang · Apr 22, 2018 · 2df6f87 · 2df6f87
1 parent 043e300
commit 2df6f87
Show file tree

Hide file tree

Showing 2 changed files with 51 additions and 14 deletions.
diff --git a/scripts/index.rst b/scripts/index.rst
@@ -4,8 +4,6 @@ Here are some useful training scripts.
 
 .. include:: language_model/word_language_model.rst
 
-See :download:`this example script <word_language_model.py>`
-
 .. include:: sentiment_analysis/sentiment_analysis.rst
 
 See :download:`this example script <sentiment_analysis.py>`

diff --git a/scripts/language_model/word_language_model.rst b/scripts/language_model/word_language_model.rst
@@ -1,34 +1,73 @@
 Word Language Model
 -------------------
 
-This script can be used to train language models with the given specification.
-
-Use the following command to run the AWDRNN language model setting (emsize=400, nhid=1,150)
+Merity, S., et al. "`Regularizing and optimizing LSTM language models <https://openreview.net/pdf?id=SyyGPP0TZ>`_". ICLR 2018
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+All the language models are trained with this script: :download:`this example script <language_model/word_language_model.py>`.
+The key features used to reproduce the results for pre-trained models are listed in the following tables.
+
+.. editting URL for the following table: https://bit.ly/2HnC2cn
+
+The dataset used for training the models is wikitext-2.
+
+
++--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+
+| Model        | awd_lstm_lm_1150_wikitext-2     | awd_lstm_lm_600_wikitext-2     | standard_lstm_lm_1500_wikitext-2     | standard_lstm_lm_650_wikitext-2     | standard_lstm_lm_200_wikitext-2     |
++==============+=================================+================================+======================================+=====================================+=====================================+
+| Mode         | LSTM                            | LSTM                           | LSTM                                 | LSTM                                | LSTM                                |
++--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+
+| Num_layers   | 3                               | 3                              | 2                                    | 2                                   | 2                                   |
++--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+
+| Embed size   | 400                             | 200                            | 1500                                 | 650                                 | 200                                 |
++--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+
+| Hidden size  | 1150                            | 600                            | 1500                                 | 650                                 | 200                                 |
++--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+
+| Dropout      | 0.4                             | 0.2                            | 0.65                                 | 0.5                                 | 0.2                                 |
++--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+
+| Dropout_h    | 0.2                             | 0.1                            | 0                                    | 0                                   | 0                                   |
++--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+
+| Dropout_i    | 0.65                            | 0.3                            | 0                                    | 0                                   | 0                                   |
++--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+
+| Dropout_e    | 0.1                             | 0.05                           | 0                                    | 0                                   | 0                                   |
++--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+
+| Weight_drop  | 0.5                             | 0.2                            | 0                                    | 0                                   | 0                                   |
++--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+
+| Tied         | True                            | True                           | True                                 | True                                | True                                |
++--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+
+| Val PPL      | 73.32                           | 84.61                          | 98.29                                | 98.96                               | 108.25                              |
++--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+
+| Test PPL     | 69.74                           | 80.96                          | 92.83                                | 93.90                               | 102.26                              |
++--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+
+| Command      | [1]                             | [2]                            | [3]                                  | [4]                                 | [5]                                 |
++--------------+---------------------------------+--------------------------------+--------------------------------------+-------------------------------------+-------------------------------------+
+
+[1] awd_lstm_lm_1150_wikitext-2 (Val PPL 73.32 Test PPL 69.74)
 
 .. code-block:: bash
 
-   $ python word_language_model.py --gpus 0 --tied --save awd_lstm_lm_1150_wikitext-2 # Val PPL 73.32 Test PPL 69.74
+   $ python word_language_model.py --gpus 0 --tied --save awd_lstm_lm_1150_wikitext-2
 
-Use the following command to run the AWDRNN language model setting (emsize=200, nhid=600)
+[2] awd_lstm_lm_600_wikitext-2 (Val PPL 84.61 Test PPL 80.96)
 
 .. code-block:: bash
 
-   $ python word_language_model.py -gpus 0 --dropout 0.2 --dropout_h 0.1 --dropout_i 0.3 --dropout_e 0.05 --weight_drop 0.2 --tied --save awd_lstm_lm_600_wikitext-2 # Val PPL 84.61 Test PPL 80.96
+   $ python word_language_model.py -gpus 0 --emsize 200 --nhid 600 --dropout 0.2 --dropout_h 0.1 --dropout_i 0.3 --dropout_e 0.05 --weight_drop 0.2 --tied --save awd_lstm_lm_600_wikitext-2
 
-Use the following command to run the StandardRNN language model setting (emsize=1,500, nhid=1,500)
+[3] standard_lstm_lm_1500_wikitext-2 (Val PPL 98.29 Test PPL 92.83)
 
 .. code-block:: bash
 
-   $ python word_language_model.py --gpus 0 --emsize 1500 --nhid 1500 --nlayers 2 --lr 20 --epochs 750 --batch_size 20 --bptt 35 --dropout 0.65 --dropout_h 0 --dropout_i 0 --dropout_e 0 --weight_drop 0 --tied --wd 0 --alpha 0 --beta 0 --save standard_lstm_lm_1500_wikitext-2 # Val PPL 98.29 Test PPL 92.83
+   $ python word_language_model.py --gpus 0 --emsize 1500 --nhid 1500 --nlayers 2 --lr 20 --epochs 750 --batch_size 20 --bptt 35 --dropout 0.65 --dropout_h 0 --dropout_i 0 --dropout_e 0 --weight_drop 0 --tied --wd 0 --alpha 0 --beta 0 --save standard_lstm_lm_1500_wikitext-2
 
-Use the following command to run the StandardRNN language model setting (emsize=650, nhid=650)
+[4] standard_lstm_lm_650_wikitext-2 (Val PPL 98.96 Test PPL 93.90)
 
 .. code-block:: bash
 
-   $ python word_language_model.py --gpus 0 --emsize 650 --nhid 650 --nlayers 2 --lr 20 --epochs 750 --batch_size 20 --bptt 35 --dropout 0.5 --dropout_h 0 --dropout_i 0 --dropout_e 0 --weight_drop 0 --tied --wd 0 --alpha 0 --beta 0 --save standard_lstm_lm_650_wikitext-2 # Val PPL 98.96 Test PPL 93.90
+   $ python word_language_model.py --gpus 0 --emsize 650 --nhid 650 --nlayers 2 --lr 20 --epochs 750 --batch_size 20 --bptt 35 --dropout 0.5 --dropout_h 0 --dropout_i 0 --dropout_e 0 --weight_drop 0 --tied --wd 0 --alpha 0 --beta 0 --save standard_lstm_lm_650_wikitext-2
 
-Use the following command to run the StandardRNN language model setting (emsize=200, nhid=200)
+[5] standard_lstm_lm_200_wikitext-2 (Val PPL 108.25 Test PPL 102.26)
 
 .. code-block:: bash
 
-   $ python word_language_model.py --gpus 0 --emsize 200 --nhid 200 --nlayers 2 --lr 20 --epochs 750 --batch_size 20 --bptt 35 --dropout 0.2 --dropout_h 0 --dropout_i 0 --dropout_e 0 --weight_drop 0 --tied --wd 0 --alpha 0 --beta 0 --save standard_lstm_lm_200_wikitext # Val PPL 108.25 Test PPL 102.26
+   $ python word_language_model.py --gpus 0 --emsize 200 --nhid 200 --nlayers 2 --lr 20 --epochs 750 --batch_size 20 --bptt 35 --dropout 0.2 --dropout_h 0 --dropout_i 0 --dropout_e 0 --weight_drop 0 --tied --wd 0 --alpha 0 --beta 0 --save standard_lstm_lm_200_wikitext-2