-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for args to ProcessorMixin for backward compatibility #33479
Add support for args to ProcessorMixin for backward compatibility #33479
Conversation
videos: VideoInput = None, | ||
**kwargs, | ||
**kwargs: Unpack[LlavaOnevisionProcessorKwargs], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zucchini-nlp It seems that the Unpack was in the init instead of the call function of the processor. Do you think it's worth a separate PR or can we include it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, yes, can be here, thanks! Regarding adding audio=None
before video, while audio won't be used for this model: seems to be counterintuitive to me.
@molbap welcome back! We had questions about changing order of main input args in your absence, and this is also distantnly related to that. Should we be adding unused args with this order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey! glad to be back, github notifications are hefty 😁
keeping audio=None is a bit strange, but it's the price for having always the same signature of constant size. I'd be pro-keeping always the same order as well. This is because in terms of integration of transformers in other tools, it makes things (a bit) easier, and would allow more future impact for processors. But if you think of something smart along the lines of informing a user of which modality/types are accepted by the model, I'm open to it!
edit: pressed enter too soon
@@ -82,7 +82,7 @@ class LlavaOnevisionProcessor(ProcessorMixin): | |||
"image_token", | |||
"video_token", | |||
] | |||
image_processor_class = "AutoImageProcessor" | |||
image_processor_class = "LlavaOnevisionImageProcessor" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zucchini-nlp This is because right now, using AutoImageProcessor.from_pretrained
after saving the models in the kwargs tests will actually load the last preprocessor saved, which is a LlavaOnevisionVideoProcessor
here. This looks like a larger problem so I'll open a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry for not noting that when merging the model. Is there any errors caused apart from cases when image and video processors can have different config values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove these changes now this has been resolved for the onevision processor
processor_components["image_processor"] = self.get_component( | ||
"image_processor", do_rescale=True, rescale_factor=-1 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the testing of images kwargs from size
and crop_size
to do_rescale
and rescale_factor
, as the fact that not all image_processors accept size
or crop_size
was making it hard to generalize to most models.
edit: hadn't finished the last sentence 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a nice workaround if do_rescale
is indeed present for all models with image processors. So what's tested now is
self.assertLessEqual(inputs[self.images_data_arg_name].mean(), 0)
With a rescale_factor of -1 as you set it, will that work in all cases? or rather, is it enough to say that the image has indeed been rescaled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we know the images will be those from the prepare_inputs
function and that they are randomly set in 0...255, this seems to be enough for me. I agree it's a hacky workaround though. But I really struggle to find something that would work with most image processors and that's easy to test...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fair enough - my point was that the prepare_image_inputs
is necessary and thus inseparable from this test. I think do_rescale
is used in 62 models, and not used in 3 (out of 65 models with image_processing
capabilities) namely
- pix2struct
- imagegpt
- layoutlmv2
two of these see less usage now and we can handle their processing on the side. In conclusion, OK for me, even though it's strange to see it, maybe just a small docstring to explain why this test exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out which models do not support do_rescale. I will make sure to use other tests for those.
And you are right I will add a small docstring to explain what is happening.
df6ddaf
to
a27a837
Compare
Looks like the test errors were unrelated to this PR, so it should be good to review now :). Also glad to see you back @molbap! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tricky tricky stuff. Nice, looking forward to having this merged, will reduce complexity a fair bit
processor_components["image_processor"] = self.get_component( | ||
"image_processor", do_rescale=True, rescale_factor=-1 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a nice workaround if do_rescale
is indeed present for all models with image processors. So what's tested now is
self.assertLessEqual(inputs[self.images_data_arg_name].mean(), 0)
With a rescale_factor of -1 as you set it, will that work in all cases? or rather, is it enough to say that the image has indeed been rescaled?
tests/test_processing_common.py
Outdated
text_data_arg_name = "input_ids" | ||
images_data_arg_name = "pixel_values" | ||
videos_data_arg_name = "pixel_values_videos" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it adds clarity to have these as class attributes rather than keeping the names around (I had to scroll up to figure out what it was). Nit though, it's fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It comes in handy when a model uses different names, such as "decoder_input_ids"
for text_data_arg_name
. Then we can just change those attribute in the processor tester class instead of rewriting all the tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree looking through the tests it's a bit confusing (I also had to scroll up to find) but like that it makes it more easily configurable.
Perhaps we could think of a different name? "data_arg_name"
isn't obvious what it is. Maybe e.g. model_input_text
or text_input_name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice - thanks for adding and for improving the tests!
@@ -958,6 +964,64 @@ def validate_init_kwargs(processor_config, valid_kwargs): | |||
unused_kwargs = {k: processor_config[k] for k in unused_keys} | |||
return unused_kwargs | |||
|
|||
def prepare_and_validate_optional_call_args(self, *args): | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice docstring :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's from @leloykun :)
@@ -82,7 +82,7 @@ class LlavaOnevisionProcessor(ProcessorMixin): | |||
"image_token", | |||
"video_token", | |||
] | |||
image_processor_class = "AutoImageProcessor" | |||
image_processor_class = "LlavaOnevisionImageProcessor" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove these changes now this has been resolved for the onevision processor
tests/test_processing_common.py
Outdated
text_data_arg_name = "input_ids" | ||
images_data_arg_name = "pixel_values" | ||
videos_data_arg_name = "pixel_values_videos" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree looking through the tests it's a bit confusing (I also had to scroll up to find) but like that it makes it more easily configurable.
Perhaps we could think of a different name? "data_arg_name"
isn't obvious what it is. Maybe e.g. model_input_text
or text_input_name
?
5b7329c
to
6eb5bbb
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
1739434
to
5a30a6e
Compare
Hey @molbap ! I made a few new modifications, mostly to fit Pixtral, but it still works for other processors. I also removed unnecessary tests for newly merged uniformized processors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @yonigozlan ! added a comment or two about a couple last things but LGTM!
processor_components["image_processor"] = self.get_component( | ||
"image_processor", do_rescale=True, rescale_factor=-1 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fair enough - my point was that the prepare_image_inputs
is necessary and thus inseparable from this test. I think do_rescale
is used in 62 models, and not used in 3 (out of 65 models with image_processing
capabilities) namely
- pix2struct
- imagegpt
- layoutlmv2
two of these see less usage now and we can handle their processing on the side. In conclusion, OK for me, even though it's strange to see it, maybe just a small docstring to explain why this test exists
tests/test_processing_common.py
Outdated
class MyProcessor(ProcessorMixin): | ||
attributes = ["image_processor", "tokenizer"] | ||
image_processor_class = "CLIPImageProcessor" | ||
tokenizer_class = ("CLIPTokenizer", "CLIPTokenizerFast") | ||
optional_call_args = ["optional_1", "optional_2"] | ||
|
||
def __init__(self, image_processor=None, tokenizer=None, processor_attr_1=1, processor_attr_2=True): | ||
super().__init__(image_processor, tokenizer) | ||
|
||
self.processor_attr_1 = processor_attr_1 | ||
self.processor_attr_2 = processor_attr_2 | ||
|
||
|
||
@require_tokenizers | ||
@require_vision | ||
class ProcessorTest(unittest.TestCase): | ||
processor_class = MyProcessor | ||
|
||
def prepare_processor_dict(self): | ||
return {"processor_attr_1": 1, "processor_attr_2": False} | ||
|
||
def get_processor(self): | ||
image_processor = CLIPImageProcessor.from_pretrained("openai/clip-vit-large-patch14") | ||
tokenizer = CLIPTokenizerFast.from_pretrained("openai/clip-vit-large-patch14") | ||
processor = MyProcessor(image_processor, tokenizer, **self.prepare_processor_dict()) | ||
|
||
return processor | ||
|
||
def test_processor_to_json_string(self): | ||
processor = self.get_processor() | ||
obj = json.loads(processor.to_json_string()) | ||
for key, value in self.prepare_processor_dict().items(): | ||
self.assertEqual(obj[key], value) | ||
self.assertEqual(getattr(processor, key, None), value) | ||
|
||
def test_processor_from_and_save_pretrained(self): | ||
processor_first = self.get_processor() | ||
|
||
with tempfile.TemporaryDirectory() as tmpdirname: | ||
saved_file = processor_first.save_pretrained(tmpdirname)[0] | ||
check_json_file_has_correct_format(saved_file) | ||
processor_second = self.processor_class.from_pretrained(tmpdirname) | ||
|
||
self.assertEqual(processor_second.to_dict(), processor_first.to_dict()) | ||
|
||
def test_prepare_and_validate_optional_call_args(self): | ||
processor = self.get_processor() | ||
# test all optional call args are given | ||
optional_call_args = processor.prepare_and_validate_optional_call_args("optional_1", "optional_2") | ||
self.assertEqual(optional_call_args, {"optional_1": "optional_1", "optional_2": "optional_2"}) | ||
# test only one optional call arg is given | ||
optional_call_args = processor.prepare_and_validate_optional_call_args("optional_1") | ||
self.assertEqual(optional_call_args, {"optional_1": "optional_1"}) | ||
# test no optional call arg is given | ||
optional_call_args = processor.prepare_and_validate_optional_call_args() | ||
self.assertEqual(optional_call_args, {}) | ||
# test too many optional call args are given | ||
with self.assertRaises(ValueError): | ||
processor.prepare_and_validate_optional_call_args("optional_1", "optional_2", "optional_3") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this was removed in a previous PR, because it only applied to one model I suppose? Can we take the tests here and move them to the tester class above instead, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, it probably existed to test the test Mixin before any processor with the mixin wasmerged but I guess it is not useful anymore.
I moved test_prepare_and_validate_optional_call_args
to the tester class above (the other tests were already there).
text_input_name = "input_ids" | ||
images_input_name = "pixel_values" | ||
videos_input_name = "pixel_values_videos" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much better!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for cleaning these up!
…ggingface#33479) * add check and prepare args for BC to ProcessorMixin, improve ProcessorTesterMixin * change size and crop_size in processor kwargs tests to do_rescale and rescale_factor * remove unnecessary llava processor kwargs test overwrite * nit * change data_arg_name to input_name * Remove unnecessary test override * Remove unnecessary tests Paligemma * Move test_prepare_and_validate_optional_call_args to TesterMixin, add docstring
What does this PR do?
Split from #32841 by @leloykun .
Add support for args inputs to processor calls for backward compatibility + improve generalization in ProcessorTesterMixin.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?