The repository aims to give basic understandings on time-series sequence-to-sequence(Seq2Seq) model for beginners.
The repo implements the following Seq2Seq models:
- LSTM encoder-decoder
- LSTM encoder-decoder with attention by Bahdanau et al.(2014). See
architecture.rnn
,architecture.attention
andarchitecture.seq2seq.AttentionLSTMSeq2Seq
- Vanilla Transformer by Vaswani et al.(2017). See
architecture.Transformer
andarchitecture.seq2seq.TransformerSeq2Seq
.
I only implemented three types of Seq2Seq model.
You may combine 1D CNN, LSTM and Transformer layers for further-customized version.
See Tutorial.ipynb for testing.
Suppose
hidden_size
Hidden state size of LSTM encoder. Equivalent tod_model
inTransformerSeq2Seq
num_layers
Number of LSTM and Transformer encoder, decoder layers.bidirectional
Whether to use bidirectional LSTM encoder.dropout
Dropout rate. Applies to:- Residual drop path in 1DCNN in
architecture.cnn
- Hidden state dropout in LSTM encoder/decoder(for every time step). Unlike
torch.nn.LSTM
, dropout is applied from the first LSTM layer. - The same dropout as Vanilla Transformer from Vaswani et al..
- Residual drop path in 1DCNN in
layernorm
Layer normalization in LSTM encoder and decoder.
All Seq2Seq models inherit architecture.seq2seq.Seq2Seq
class:
class Seq2Seq(Skeleton):
def __init__(self):
super().__init__()
self.initialize_skeleton(locals())
def forward_auto(self, x: torch.Tensor, trg_len: int):
# implements autoregressive decoding
raise NotImplementedError
def forward_labeled(self, x: torch.Tensor, y: torch.Tensor):
# implements teacher-forced decoding
raise NotImplementedError
def forward(
self,
x: torch.Tensor,
trg_len: int,
y: Optional[torch.Tensor] = None,
teacher_forcing: float = -1,
):
batch_size, length_x, input_size = x.size()
p = random.uniform(0, 1)
if p > teacher_forcing and y is not None:
return self.forward_labeled(x, y)
else:
return self.forward_auto(x, trg_len)
implements autoregressive forward, which uses the model's previous time step output as current time step input in its decoder.
- Generally used in inference stage, when label output data is not available.
implements teacher-forced forward, which uses previous time step label data as current time step input in the decoder.
- Generally used in training stage, when label output data is available.
runs forward_labeled
with probability of teacher_forcing
.
Else generates forward_auto
.
import torch
import torch.nn as nn
from architecture.seq2seq import LSTMSeq2Seq, AttentionLSTMSeq2Seq, TransformerSeq2Seq
batch_size = 32
length_x = 40 # input sequence length
length_y = 60 # output label sequence length
input_size = 27 # input feature size
output_size = 6 # output feature size
hidden_size = 128 # hidden state size of LSTM encoder. Equivalent to d_model in TransformerSeq2Seq
dropout = 0.1 # dropout rate
num_layers = 3 # number of LSTM and Transformer encoder, decoder layers
bidirectional = True # Whether to use bidirectional LSTM encoder
layernorm = True # Layer normalization in LSTM encoder and decoder
x = torch.randn(batch_size, length_x, input_size)
y = torch.randn(batch_size, length_y, output_size)
model = LSTMSeq2Seq(
input_size=input_size,
output_size=output_size,
hidden_size=hidden_size,
num_layers=num_layers,
bidirectional=bidirectional,
dropout=dropout,
)
out_1 = model.forward_auto(x, 100)
out_2 = model.forward_labeled(x, y)
print(out_1.shape, out_2.shape)
torch.Size([32, 100, 6]) torch.Size([32, 60, 6])
model = AttentionLSTMSeq2Seq(
input_size=input_size,
output_size=output_size,
hidden_size=hidden_size,
num_layers=num_layers,
bidirectional=bidirectional,
dropout=dropout,
)
out_1 = model.forward_auto(x, 100)
out_2 = model.forward_labeled(x, y)
print(out_1.shape, out_2.shape)
torch.Size([32, 100, 6]) torch.Size([32, 60, 6])
model = TransformerSeq2Seq(
input_size=input_size,
output_size=output_size,
num_layers=num_layers,
d_model=hidden_size,
n_heads=4,
dropout=dropout,
d_ff=hidden_size * 4,
)
out_1 = model.forward_auto(x, 100)
out_2 = model.forward_labeled(x, y)
print(out_1.shape, out_2.shape)
torch.Size([32, 100, 6]) torch.Size([32, 60, 6])
- Parameters can be counted by
model.count_params()
- Properties are accessed using
model.model_info
attribute. - Another model instance can be created by
ModelClass(**model.model_init_args)
.
These features are attributed to architectures.skeleton.Skeleton
class.
model.count_params()
model_info = model.model_info
model_init_args = model.model_init_args
print(model_info)
another_model_instance = TransformerSeq2Seq(**model_init_args)
Number of trainable parameters: 1,393,798
{'bidirectional': True, 'd_ff': 512, 'd_model': 128, 'dropout': 0.1, 'hidden_size': 256, 'input_size': 27, 'layernorm': False, 'n_heads': 4, 'num_hl': 0, 'num_layers': 3, 'output_size': 6}
All network parameters are random-initialized from torch.zeros
and normalization layer weights with torch.ones
. See architectures.init
.