The current implementation of reshape operator changes shape of an input Tensor into a specified shape and the shape information is set by operator attribute. This means the shape information is required to know before the operator runs. This is not the situation for some NLP tasks.
For example:
- in Transformer, a 3-D tensor with a shape [
batch size, max sequence length, hidden dimension] is needed to reshape into a 2-D tensor with a shape [batch size, max sequence length $\times$ hidden dimension] .
- Both
batch size and max sequence length cannot be known before the operator runs.
- The current implementation of
reshape operator set the shape of its output at compile time not run time.