forked from apache/mxnet
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Fix speech demo. * Using a random seed for cudnn dropout. Previously, the fixed seed will generate the same mask for each iteration in imperative mode. * PTB LM example now has far btter PPL: 1) forget_bias=0 2) clipping range 3) lr anealing 4) initliazation. * (1) Remove mean for loss function (good for multi-gpu). (2) Change clip and lr to sample based. (3) Change hyperparameters, now we get slightly better results than pytorch. * Remove the lstmbias init in model.py since it already been set to 0.
- Loading branch information
Showing
5 changed files
with
84 additions
and
26 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# Word-level language modeling RNN | ||
|
||
This example trains a multi-layer RNN (Elman, GRU, or LSTM) on Penn Treebank (PTB) language modeling benchmark. | ||
|
||
The model obtains the state-of-the-art result on PTB using LSTM, getting a test perplexity of ~72. | ||
|
||
The following techniques have been adopted for SOTA results: | ||
- [LSTM for LM](https://arxiv.org/pdf/1409.2329.pdf) | ||
- [Weight tying](https://arxiv.org/abs/1608.05859) between word vectors and softmax output embeddings | ||
|
||
## Data | ||
|
||
The PTB data is the processed version from [(Mikolov et al, 2010)](http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf): | ||
|
||
```bash | ||
python data.py | ||
``` | ||
|
||
## Usage | ||
|
||
Example runs and the results: | ||
|
||
``` | ||
python train.py --cuda --tied --nhid 650 --emsize 650 --dropout 0.5 # Test ppl of 75.3 | ||
python train.py --cuda --tied --nhid 1500 --emsize 1500 --dropout 0.65 # Test ppl of 72.0 | ||
``` | ||
|
||
<br> | ||
|
||
`python train.py --help` gives the following arguments: | ||
``` | ||
Optional arguments: | ||
-h, --help show this help message and exit | ||
--data DATA location of the data corpus | ||
--model MODEL type of recurrent net (rnn_tanh, rnn_relu, lstm, gru) | ||
--emsize EMSIZE size of word embeddings | ||
--nhid NHID number of hidden units per layer | ||
--nlayers NLAYERS number of layers | ||
--lr LR initial learning rate | ||
--clip CLIP gradient clipping | ||
--epochs EPOCHS upper epoch limit | ||
--batch_size N batch size | ||
--bptt BPTT sequence length | ||
--dropout DROPOUT dropout applied to layers (0 = no dropout) | ||
--tied tie the word embedding and softmax weights | ||
--cuda Whether to use gpu | ||
--log-interval N report interval | ||
--save SAVE path to save the final model | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters