-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roberta's Positional Embedding Offset #5285
Comments
That's certainly possible. As you can see from my comment, and PR #5188 , I don't fully understand the motivation for the offset. It is very tricky. |
I figured out why. See here facebookresearch/fairseq#1177 I think we can simply use masked_fill() to make positional embedding = 0 on padding positions, so the code is easier to understand (no need for the offset). |
Exactly! So I think we must settle for documenting what is going on better in |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
transformers/src/transformers/modeling_bart.py
Line 754 in d4c2cb4
transformers/src/transformers/modeling_bart.py
Line 763 in d4c2cb4
So this offset is added because the function
create_position_ids_from_input_ids
shifts the position ids by padding_idx + 1. However, I wonder if other models should also include this?transformers/src/transformers/modeling_roberta.py
Line 54 in d4c2cb4
For instance, when I am using
Longformer
, it looks like the offset is not added toRoberta
, so I need to add such a offset to config.max_position_embeddingsThe text was updated successfully, but these errors were encountered: