Skip to content

Conversation

@yonigozlan
Copy link
Member

@yonigozlan yonigozlan commented Nov 3, 2025

What does this PR do?

Fixes #41955.
Fixes an issue raised in #41954. Instead of setting attributes from kwargs after instantiating the image processor in from_pretrained, we update the image processor dict with the kwargs before instantiating the object. This allows custom logic in the init to take into account the custom kwargs passed to from_pretrained.

In the PR linked, the issue was that max_pixels is supposed to overwrite size["longest_edge"] when passed to the init, but in from_pretrained, max_pixels was never passed to the init and only set as an attribute after instantiating the image processor.

@ReinforcedKnowledge
Copy link

ReinforcedKnowledge commented Nov 3, 2025

I have a quick question about the:

        image_processor_dict.update(kwargs)
        image_processor = cls(**image_processor_dict)

Fix. Because you are adding user given kwargs to image_processor which is built by using the processor kwarg. And maybe the initializing of the class doesn't support those kwargs? (Not trying to criticize just trying to learn more about transformers).

I think the only way to check wether the kwargs are or are not in the init function is by inspecting its signature, but I don't think it's clean and it doesn't solve the underlying core pattern. Because processors can do whatever they want in their init, even use env vars for all they care while from_pretrained can't solve those issues for them.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yonigozlan
Copy link
Member Author

I have a quick question about the:

    image_processor_dict.update(kwargs)
    image_processor = cls(**image_processor_dict)

Fix. Because you are adding user given kwargs to image_processor which is built by using the processor kwarg. And maybe the initializing of the class doesn't support those kwargs? (Not trying to criticize just trying to learn more about transformers)

No problem feel free to ask questions! The idea is that all the init logic/setting of attributes and checking of kwargs should be in the class init, to avoid having different results when loading models in different ways (from_pretrained, from_dict, direct instantiation etc.).
Fast image processors are becoming the default in the library, and they all include checks of kwargs in their init, to only accept kwargs that they should accept. So this fix should work without issues, as all fast image processors accept **kwargs in init, then filter them to set as attributes only the kwargs that are in "Model"ImageProcessorKwargs.

@ReinforcedKnowledge
Copy link

I agree, if you can enforce some kind of contract where the __init__ works with any **kwargs that the user passes then it's good. But I don't think it's easily doable. And also, the user might pass kwargs that are not used in the __init__ but used downstream in some other utility function of the processor, that checks wether an attribute exists, and then does some specific logic based on the existence of that attribute otherwise it does something else. But the __init__ might not care specifically about it. I don't know if my point is clear 🤔

And thank you for your openness!

@yonigozlan
Copy link
Member Author

Added logic to more clearly use cls.valid_kwargs to update the image processor dict ;).

And also, the user might pass kwargs that are not used in the init but used downstream in some other utility function of the processor, that checks wether an attribute exists, and then does some specific logic based on the existence of that attribute otherwise it does something else. But the init might not care specifically about it. I don't know if my point is clear 🤔

In that case, if the kwarg is in the valid_kwargs attribute of the processor class, it will still be set as an instance attribute (see the logic here in the base fast image processor class)

@ReinforcedKnowledge
Copy link

Oh ok perfect thank you @yonigozlan didn't know about valid_kwargs, that's exactly what I wanted to do initially but didn't know about them. Nice chatting with you!

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me, good use of valid_kwargs but imo we should also get rid of the mutated kwargs!

@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: pix2struct

@yonigozlan yonigozlan merged commit 900cf9d into huggingface:main Nov 4, 2025
23 checks passed
Comment on lines +222 to +224
image_seq_length (`int`, *optional*):
The number of image tokens to be used for each image in the input.
Added for backward compatibility but this should be set as a processor attribute in future models.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for nitty-picking after PR is merged, For my own understanding, do we need it in here, because not all image processors are VLM-specific and image_seq_length isn't always needed for them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, I agree with you we shouldn't really have this here, as this should be a processor attribute, but a lot of models on the hub have it as an image processor attribute, so I put it here for BC...

yonigozlan added a commit to yonigozlan/transformers that referenced this pull request Nov 7, 2025
…gface#41997)

* accept kwargs in image proc from_pretrained

* only use kwargs that are in cls.valid_kwargs

* remove specific logic for _from_auto

* add image_seq_length to Images_kwargs for backward compatibility

* fix missing image kwargs in pix2struct
Abdennacer-Badaoui pushed a commit to Abdennacer-Badaoui/transformers that referenced this pull request Nov 10, 2025
…gface#41997)

* accept kwargs in image proc from_pretrained

* only use kwargs that are in cls.valid_kwargs

* remove specific logic for _from_auto

* add image_seq_length to Images_kwargs for backward compatibility

* fix missing image kwargs in pix2struct
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

max_pixels parameter ignored when loading Qwen2VL/Qwen3VL image processors via from_pretrained()

5 participants