You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently developing a T5 model (encoder-decoder architecture) from scratch for educational purposes. While working on this project, I've encountered some confusion regarding the pre-training objective, specifically the denoising objective. I would like to clarify my understanding and have some questions about the process.
Given the sentence:
Thank you for inviting me to your party last week.
Based on my understanding, during the pre-training phase with a denoising objective, the model works as follows:
Encoder input: Thank you <X> me to your party <Y> week
Decoder input: <X> for inviting <Y> last
Decoder labels (true labels): for inviting <Y> last <Z>
Here are my questions:
Is my interpretation of how the encoder input, decoder input, and decoder labels are constructed correct?
In this setup, the model is expected to predict sentinel tokens (e.g., <X>, <Y>). Could this potentially introduce confusion for the model, for example, it may take the idea that it is possible for the word "last" to come after the token ? Or does the model naturally learn to interpret these situations correctly?
Accordingly to the paper:
we process the sentence Thank you for inviting me to your party last week. The words for, inviting and last are randomly chosen for corruption. Each consecutive span of corrupted tokens is replaced by a sentinel token (shown as <X> and <Y>) that is unique over the example. Since for and inviting occur consecutively, they are replaced by a single sentinel <X>. The output sequence then consists of the dropped-out spans, delimited by the sentinel tokens used to replace them in the input plus a final sentinel token <Z>.
The text was updated successfully, but these errors were encountered:
I am currently developing a T5 model (encoder-decoder architecture) from scratch for educational purposes. While working on this project, I've encountered some confusion regarding the pre-training objective, specifically the denoising objective. I would like to clarify my understanding and have some questions about the process.
Given the sentence:
Based on my understanding, during the pre-training phase with a denoising objective, the model works as follows:
Thank you <X> me to your party <Y> week
<X> for inviting <Y> last
for inviting <Y> last <Z>
Here are my questions:
<X>
,<Y>
). Could this potentially introduce confusion for the model, for example, it may take the idea that it is possible for the word "last" to come after the token ? Or does the model naturally learn to interpret these situations correctly?Accordingly to the paper:
The text was updated successfully, but these errors were encountered: