-
Notifications
You must be signed in to change notification settings - Fork 31.2k
Attn implementation for composite models #32238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo, paligemma
…, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
qubvel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! Not sure I fully understand the logic at this moment, so I left a few comments below
|
Regarding it could probably reuse But both will be somehow slow as you mentioned, and sometimes give unexpected errors (because loading a modeling module file). But we might have someway to avoid that. |
|
@ydshieh @qubvel I made some changes after our last discussion that changing class attr is not good. So now we don't change the attr per se, but we change the config's I will add more tests if we agree on the design And also, regarding
unfortunately we can't do that, cause it raises error if key is not in the dict and we have not auto-mappable models. So we rather try to get and if not simple continue. |
but why not try except around the usage of |
|
Hmm, not sure how good if to use a |
|
As we discussed internally, I made the loading logic model-agnostic. So now we check if the config contains any I added some tests in vision enc-dec models where only one sub-config supports SDPA and when both support. Also added tests in Idefics (which were skipped earlier) to verify that LM has its SDPA dispatched even though other modules do not. TODO in the next few hours or tomorrow:
|
|
@ArthurZucker I made changes as you suggested but I want to warn that it still needed a few extra logic. Making this PR revealed soo many issues. So we have two options now to indicate attn for composite models: Attn implementation as a dict:
Attn implementation directly propagated in corresponding configs:
Imagine we have a composite or nested config. Currently we set attn in three places: |
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modeling changes are just perfect 😉 before we have a big refactor this is super super welcome
| for key in config: | ||
| if isinstance(getattr(config, key), PretrainedConfig): | ||
| sub_config = getattr(config, key) | ||
| curr_attn_implementation = ( | ||
| requested_attn_implementation | ||
| if not isinstance(requested_attn_implementation, dict) | ||
| else requested_attn_implementation.get(key, None) | ||
| ) | ||
| sub_config._attn_implementation_internal = curr_attn_implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is donc to set the attn_implementation of the sub configs, I am wondering if we can't have a way to use is_composition to simply say:
- if we are a composition, then we call on our composiite member the
_autoset_attn_implementationfunction
Today we have a problem with is_compoisition: we have this
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs) -> "PretrainedConfig":
cls._set_token_in_kwargs(kwargs)
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
if config_dict.get("model_type") == "mllama":
config_dict = config_dict["vision_config"]
if "model_type" in config_dict and hasattr(cls, "model_type") and config_dict["model_type"] != cls.model_type:
logger.warning(
f"You are using a model of type {config_dict['model_type']} to instantiate a model of type "
f"{cls.model_type}. This is not supported for all configurations of models and can yield errors."
)
return cls.from_dict(config_dict, **kwargs)which IMO we should never have to do and we copy past this everywhere.
This means there is something wrong with how we handle composition!
|
If we find a clean way to auto_set attn implementation would be better in the futur! |
|
merging tomorrow, the tests pass locally in most cases except fro the cases not touched by this PR. For ex some FA2/sdpa equivalence tests are failing for me in |
* first try * codestyle * idefics2 is happy * [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo, paligemma * fix-copies * [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo * blip-2 needs to init vision from config * when was this removed O_o * minor fix * tests * this way? * tests * model-agnostic code * codestyle * add tests for idefics * modify general test for VLMs * no generation test for vlm yet! * no generation test here also * wanr in VIT-SDPA if output attn * add more tests * user can pass dict as attn impl * repo consistency * update * muicgen * no prints * forgot speech enc-dec and clip * how many composite models we have? * musicgen meelody is same as mudicgen * +siglip * fix tests + add some more * remove idefics custom overriden code * make idefics2 automappable * nits * skip tests * doctests * Update src/transformers/models/idefics2/configuration_idefics2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/clip/test_modeling_clip.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/idefics2/test_modeling_idefics2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/idefics2/test_modeling_idefics2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/configuration_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * major update, no need for automap * clean up * add FA2 test * more tests * style * skip tests * why did these started failing now? * no attributes for FA2 needed * one tiny test * address comment about FA2 false warning * style * add new models and resolve conflicts * fix copies * let it be this way for now, come back tomorrow to review * some more fixes * update * more updates * update * fix copies * style and tests * another big update * fix tests * fix tests * update * another update * fix tests * fix copies * fix tests --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
What does this PR do?
Related to #32221 and fixes #30565. As described in the issue, currently MultiModal models set attn implementation only in the LM backbone, and not in vision backbone. While working on it, I found another bug. Specifically, using a property to set SDPA flag to VLM class doesn't work because we don't know what is
self.LMbefore init the VLM. I tried class property, but that would throw error thatself has no attribute LM.This PR makes a few modifications to how attn impl is set in modeling to fix the above issues.
More precisely:
attn_implementation. And the dict keys have to be identical to the ModelConfig attribute names for sub-configs. In other words, if the config has config.text_config, then the dict key must be "text_config" to correctly dispatchNOTE:
I had a few rare cases where the model was not really composite but had several sub-configs. For example DBRX, and I decided to get rid of the sub-configs there. I am not sure what is the right way to deprecate that, so any comments on that welcome
Also, some models have nested configs, where a vision config is part of text config (Qwen2-VL). Only three models are like that, mostly because I didn;t review properly when the model was added and didn't insist on changing stuff. For those models, we don't consider them as composite and just leave it working as it was