Skip to content

TODO list for Transformer. #7355

Closed
Closed
@lcy-seso

Description

@lcy-seso
  • implement they layer normalization operator and the Python wrapper.
    • CPU implementation.
    • GPU implementation.
    • python wrapper.
  • enhance the matmul operator to support 4-D tensor as its inputs Does it need to enhance matmul_op to support 4-D inputs #7319. fixed by PR: Enhance matmul_op to support 4-D inputs #7656
  • prepare the dataset.
    fixed by PR: Add WMT16 into dataset. #7661
  • wrap the masked positional embedding.
  • enhance the lookup_table operator to support the special token: padding index. Support padding_idx in the lookup_table_op. #7309.
  • wrap the multi-head dot product attention. This is different to ConvS2S.
  • wrap the positional-wise feed-forward network.
  • wrap the basic computation block.
  • build the entire model.
  • enhance the documentation of operators used in Transformer.
  • add beam search for Transformer.
  • clean codes and merge the entire project into the models repo (merge the work part by part).
  • Learning Rate Scheduler
  • Residual Dropout
  • Label Smoothing
    • label smooth operator.
    • python wrapper.
  • Scaled Dot Product Attention
  • Weight sharing between embedding and pre-softmax linear transformation layers

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions