-
Notifications
You must be signed in to change notification settings - Fork 279
implement of leftpadding #2242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
implement of leftpadding #2242
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! couple comments
default_value=self.pad_value, | ||
shape=(batch_size, sequence_length), | ||
) | ||
if self.padding_side == "left": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe through this into tensor_utils, we will need it for multi-segment packer as well right?
@@ -172,10 +174,17 @@ def call( | |||
x = tf.concat([x, end_token_id_tensor], axis=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need an extra case here, where we still truncate on the right even if end_value is not set. Otherwise I think we might switch from right truncation when end_value=2, padding_side="left"
to to left truncation when end_value=None, padding_side="left"
, which would be confusing. We could also consider opening up truncation_side as a parameter.
Let's also add unit tests for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very confused about the necessity of this special situation, because in common applications, left and right usually refer to the direction of padding.
So could you give a specific example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a colab showing the issue.
In the colab, we are never padding and always truncating, so padding side should not matter. But with padding_side="left"
we will get weird behavior if end_value=None
, as we will truncate in the to_tensor()
call after the reverse (so truncating on a different side).
We should definitely add a unit test for this. Two possible fixes:
- Always truncate on the right had side. Which means we will need to truncate before this padding code if
end_value=None
. Something likex = x[..., : sequence_length]
, before reversing the tokens. - Add a
truncation_side
argument. Allows for maximum configurability. Would need tests for all possible combination of padding_side and truncation_side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a colab showing the issue.
In the colab, we are never padding and always truncating, so padding side should not matter. But with
padding_side="left"
we will get weird behavior ifend_value=None
, as we will truncate in theto_tensor()
call after the reverse (so truncating on a different side).We should definitely add a unit test for this. Two possible fixes:
- Always truncate on the right had side. Which means we will need to truncate before this padding code if
end_value=None
. Something likex = x[..., : sequence_length]
.- Add a
truncation_side
argument. Allows for maximum configurability. Would need tests for all cases.
I'm sorry, I didn't understand what you meant. Can you take care of this part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will not have time sadly, but this implementation is buggy. I've commented on the colab the specific output we would expect and do not get. You could make that a test case and maybe use that to start thinking about the problem. You could also take a look at truncation_side
in huggingface transformers, that might help understand the issue conceptually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will not have time sadly, but this implementation is buggy. I've commented on the colab the specific output we would expect and do not get. You could make that a test case and maybe use that to start thinking about the problem. You could also take a look at in huggingface transformers, that might help understand the issue conceptually.
truncation_side
i try fix this error.plz check it
@mattdangerw |
from #2237