Pedagogical example of sequence to sequence recurrent neural network with TensorFlow and TFLearn.
(Also see: Pedagogical example of wide and deep learning)
This code provides a complete, pegagogical, working example of a seq2seq RNN, implemented using TFLearn, which transforms input sequences of integers, to output sequences of integers.
The desired transformation is defined by a pattern-generating python function, in the file pattern.py. Examples are provided showing simple patterns, including sorting and reversing the input sequence. The lengths of the input and output sequences can be set, and other patterns can be added.
Tensorflow's seq2seq.py library is used for the RNN. "embedding_rnn" is used by default, but "embedding_attention" works well (often better). See the seq2seq tutorial provided by TensorFlow for an overview. Overall, the structure is like this (image from https://github.com/farizrahman4u/seq2seq):
The basic idea is that the integer sequence values are converted, using an embedding map, to a vector in a high-dimensional space. RNN cells then take vectors in this space as input. After the input sequence ends, the first output cell is fed a special "GO" symbol, which triggers the start of an output sequence. When making predictions, the output cells each take as input, the output of the previous cell. When training, the inputs are taken from the previous expected symbol (embedded in the output's high dimensional embedding space).
usage: tflearn_seq2seq.py [-h] [-v] [-m MODEL] [-r LEARNING_RATE] [-e EPOCHS]
[-i INPUT_WEIGHTS] [-o OUTPUT_WEIGHTS]
[-p PATTERN_NAME] [-n NAME] [--in-len IN_LEN]
[--out-len OUT_LEN] [--from-file FROM_FILE]
[--iter-num ITER_NUM] [--data-dir DATA_DIR]
[-L NUM_LAYERS] [--cell-size CELL_SIZE]
[--cell-type CELL_TYPE]
[--embedding-size EMBEDDING_SIZE]
[--tensorboard-verbose TENSORBOARD_VERBOSE]
cmd [cmd_input [cmd_input ...]]
Commands:
train - give size of training set to use, as argument
predict - give input sequence as argument (or specify inputs via --from-file <filename>)
positional arguments:
cmd command
cmd_input input to command
optional arguments:
-h, --help show this help message and exit
-v, --verbose increase output verbosity (add more -v to increase versbosity)
-m MODEL, --model MODEL
seq2seq model name: either embedding_rnn (default) or embedding_attention
-r LEARNING_RATE, --learning-rate LEARNING_RATE
learning rate (default 0.0001)
-e EPOCHS, --epochs EPOCHS
number of trainig epochs
-i INPUT_WEIGHTS, --input-weights INPUT_WEIGHTS
tflearn file with network weights to load
-o OUTPUT_WEIGHTS, --output-weights OUTPUT_WEIGHTS
new tflearn file where network weights are to be saved
-p PATTERN_NAME, --pattern-name PATTERN_NAME
name of pattern to use for sequence
-n NAME, --name NAME name of model, used when generating default weights filenames
--in-len IN_LEN input sequence length (default 10)
--out-len OUT_LEN output sequence length (default 10)
--from-file FROM_FILE
name of file to take input data sequences from (json format)
--iter-num ITER_NUM training iteration number; specify instead of input- or output-weights to use generated filenames
--data-dir DATA_DIR directory to use for storing checkpoints (also used when generating default weights filenames)
-L NUM_LAYERS, --num-layers NUM_LAYERS
number of RNN layers to use in the model (default 1)
--cell-size CELL_SIZE
size of RNN cell to use (default 32)
--cell-type CELL_TYPE
type of RNN cell to use (default BasicLSTMCell)
--embedding-size EMBEDDING_SIZE
size of embedding to use (default 20)
--tensorboard-verbose TENSORBOARD_VERBOSE
tensorboard verbosity level (default 0)
This command trains an embedding_attention seq2seq RNN on 100,000 input sequences, using the "reversed" pattern (for which the output sequence is the reverse of the input sequence), using 10 epochs:
python tflearn_seq2seq.py -v -v -o weights.tfl -p reversed -m embedding_attention -e 10 train 100000
Note that 10% of the training dataset is set aside for validation. The output should be something like this:
[TFLearnSeq2Seq] Training on 100000 point dataset (pattern 'reversed'), with 10 epochs
model parameters: {
"cell_size": 32,
"cell_type": "BasicLSTMCell",
"embedding_size": 20,
"num_layers": 1,
"tensorboard_verbose": 0,
"learning_rate": 0.0001
}
---------------------------------
Run id: TFLearnSeq2Seq
Log directory: /tmp/tflearn_logs/
---------------------------------
Training samples: 90000
Validation samples: 10000
--
Training Step: 5000 | total loss: 0.01601
| Adam | epoch: 007 | loss: 0.01601 - acc: 0.9997 | val_loss: 0.01598 - val_acc: 0.9998 -- iter: 09216/90000
Done!
Saved weights.tfl
In this example, final weights are saved to the file weights.tfl
(and weights.tfl.meta
).
This command generates the sequence predicted by the RNN, for the
given input 0 1 2 3 4 5 6 7 8 9
, using weights from the file weights.tfl
:
python tflearn_seq2seq.py -i weights.tfl -v -p reversed -m embedding_attention predict 0 1 2 3 4 5 6 7 8 9
The output should be something like this:
==> For input [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], prediction=[8 8 5 3 6 5 4 2 3 1] (expected=[9 8 7 6 5 4 3 2 1 0])
Note how the prediction ([8 8 5 3 6 5 4 2 3 1]
) is close to, but not exactly, what was ideally expected ([9 8 7 6 5 4 3 2 1 0]
).
Here's another example, using a set of pre-trained weights for the "sorted" sequence pattern:
python tflearn_seq2seq.py -i TRAINED_WEIGHTS/ts2s__attention__sorted_1.tfl \
-p sorted --in-len=20 --out-len=20 -m embedding_attention \
predict 9 8 7 6 5 4 3 2 1 2 5 1 8 7 7 3 9 1 4 6
The output gives:
[TFLearnSeq2Seq] model weights loaded from t2s__basic__sorted_1.tfl
==> For input [9, 8, 7, 6, 5, 4, 3, 2, 1, 2, 5, 1, 8, 7, 7, 3, 9, 1, 4, 6], prediction=[1 1 1 2 2 3 3 4 4 5 5 6 6 7 7 7 7 8 8 8] (expected=[1 1 1 2 2 3 3 4 4 5 5 6 6 7 7 7 8 8 9 9])
which, again, is close (but not exact).
Better results could probably be obtained by using a more complex model, e.g. with larger LSTM cells, or more layers, or more training.
Unit tests are provided, implemented using pytest. Run these using:
py.test tflearn_seq2seq.py
- Requires TF 0.10 or better (does not work with TF 0.9)
- Requires TFLEARN installed from github (PIP repository has version 0.2.1 that does not work with TF 0.10)
- "pip install hd5py" to avoid runtime wanring: "hdf5 not supported (please install/reinstall h5py)"
This example does not demonstrate machine translation per se, but the basic structure could be extended to do so. No buckets or other such sophistications are used; this makes it easier to understand the basics.