Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on T5 Model Pre-training Objective and Denoising Process #1121

Open
AliHaiderAhmad001 opened this issue Apr 1, 2024 · 0 comments

Comments

@AliHaiderAhmad001
Copy link

I am currently developing a T5 model (encoder-decoder architecture) from scratch for educational purposes. While working on this project, I've encountered some confusion regarding the pre-training objective, specifically the denoising objective. I would like to clarify my understanding and have some questions about the process.

Given the sentence:

Thank you for inviting me to your party last week.

Based on my understanding, during the pre-training phase with a denoising objective, the model works as follows:

  • Encoder input: Thank you <X> me to your party <Y> week
  • Decoder input: <X> for inviting <Y> last
  • Decoder labels (true labels): for inviting <Y> last <Z>

Here are my questions:

  1. Is my interpretation of how the encoder input, decoder input, and decoder labels are constructed correct?
  2. In this setup, the model is expected to predict sentinel tokens (e.g., <X>, <Y>). Could this potentially introduce confusion for the model, for example, it may take the idea that it is possible for the word "last" to come after the token ? Or does the model naturally learn to interpret these situations correctly?

Accordingly to the paper:

Untitled

we process the sentence Thank you for inviting me to your party last week. The words for, inviting and last are randomly chosen for corruption. Each consecutive span of corrupted tokens is replaced by a sentinel token (shown as <X> and <Y>) that is unique over the example. Since for and inviting occur consecutively, they are replaced by a single sentinel <X>. The output sequence then consists of the dropped-out spans, delimited by the sentinel tokens used to replace them in the input plus a final sentinel token <Z>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant