Skip to content

Seq2Seq Transformer Tutorial #1225

Closed
@mmwebster

Description

I'm having difficulty understanding a few aspects of the Seq2Seq transformer tutorial (https://pytorch.org/tutorials/beginner/transformer_tutorial.html)

  1. The tutorial says that it implements the architecture from Attention Is All You Need, but I don't see a TransformerDecoder used anywhere. It instead looks like only a TransformerEncoder is used. How does this example work without the decoder?
  2. The tutorial says that it uses a softmax to output probabilities over the dictionary, but I only see a linear output layer. Where is the softmax applied?
  3. Is this model learning to predict one word ahead (e.g. [hi how are you] -> [how are you doing])? I can't find the actual task described anywhere, only the inputs and targets in terms of an alphabet

Appreciate any help.

cc @pytorch/team-text-core @Nayef211

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions