Closed
Description
- implement they layer normalization operator and the Python wrapper.
- CPU implementation.
- GPU implementation.
- python wrapper.
- enhance the matmul operator to support 4-D tensor as its inputs Does it need to enhance matmul_op to support 4-D inputs #7319. fixed by PR: Enhance matmul_op to support 4-D inputs #7656
- prepare the dataset.
fixed by PR: Add WMT16 into dataset. #7661 - wrap the masked positional embedding.
- enhance the lookup_table operator to support the special token: padding index. Support padding_idx in the lookup_table_op. #7309.
- wrap the multi-head dot product attention. This is different to ConvS2S.
- wrap the positional-wise feed-forward network.
- wrap the basic computation block.
- build the entire model.
- enhance the documentation of operators used in Transformer.
- add beam search for Transformer.
- clean codes and merge the entire project into the models repo (merge the work part by part).
- Learning Rate Scheduler
- Residual Dropout
- Label Smoothing
- label smooth operator.
- python wrapper.
- Scaled Dot Product Attention
- Weight sharing between embedding and pre-softmax linear transformation layers