Seq2Seq Transformer Tutorial

I'm having difficulty understanding a few aspects of the Seq2Seq transformer tutorial (https://pytorch.org/tutorials/beginner/transformer_tutorial.html)

1. The tutorial says that it implements the architecture from Attention Is All You Need, but I don't see a TransformerDecoder used anywhere. It instead looks like only a TransformerEncoder is used. How does this example work without the decoder?
2. The tutorial says that it uses a softmax to output probabilities over the dictionary, but I only see a linear output layer. Where is the softmax applied?
3. Is this model learning to predict one word ahead (e.g. [hi how are you] -> [how are you doing])? I can't find the actual task described anywhere, only the inputs and targets in terms of an alphabet

Appreciate any help.



cc @pytorch/team-text-core @Nayef211

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seq2Seq Transformer Tutorial #1225

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development