-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify ProcessorTesterMixin for better generalization #32637
Modify ProcessorTesterMixin for better generalization #32637
Conversation
…size for image_processor kwargs
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! I think we can do one more thing to clean up tests, left a comment. Up to you
inputs = processor( | ||
text=input_str, images=image_input, return_tensors="pt", max_length=112, padding="max_length" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a bit redundant to indicate max-length when init tokenizer and when calling. IMO we can remove it from init because we're testing kwargs at call
Or we can even explicitly init with a padding=longest
and test if kwargs overrides defaults, as the test name suggests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on removing from the init.
Checking the kwargs overrides is also a good idea - we should add a new test to isolate and check this behaviour
tests/test_processing_common.py
Outdated
if not tokenizer.pad_token: | ||
tokenizer.pad_token = "[TEST_PAD]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this recurring in all tests, maybe we should add pad token when saving dummy tokenizer, so it's loaded with a pad id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as well 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating!
+1 on all of @zucchini-nlp comment's - overall LGTM :)
tests/test_processing_common.py
Outdated
if not tokenizer.pad_token: | ||
tokenizer.pad_token = "[TEST_PAD]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
inputs = processor( | ||
text=input_str, images=image_input, return_tensors="pt", max_length=112, padding="max_length" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on removing from the init.
Checking the kwargs overrides is also a good idea - we should add a new test to isolate and check this behaviour
…tokenizer kwargs, remove unnecessary test overwrites in grounding dino
Thanks for the reviews! I have made the requested modifications, and removed now unnecessary tests for grounding-dino. If you think these modifications look ok I'll merge! |
What does this PR do?
Follows this #32544 (comment). It will help avoid rewriting many tests for
ProcessorTest
classes that inherit fromProcessorTesterMixin
.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@molbap @zucchini-nlp @amyeroberts