Skip to content

Conversation

@qubvel
Copy link
Contributor

@qubvel qubvel commented Jul 25, 2024

What does this PR do?

CLIP vision model now supports sdpa attention and automatically dispatch on sdpa if possible. Llava-like models utilize AutoModel.from_config to initialize vision tower, however, attn_implementation was not propagated, which caused errors in CI: the model was in eager mode, and at the same time CLIP vision tower was with sdpa attention.

cc @zucchini-nlp @ydshieh

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this! Partially related to #30565

The only concern is that the current way we check supports_sdpa as a pretrainedModel's property is not correct. There are two concerns here:

  1. Currently it checks only language_model.supports_sdpa which means that if LM has SDPA and vision tower doesn't, it will fail to assign sdpa to vision
  2. The above holds true, if we trust that the property supports_sdpa works correctly. recently I found that it can't go and check LM's support_sdpa flag because the LM is not initialized at that step. So we don;t know what class will be LM and neither know its flags. Same goes true for vision tower. I am currently working on that, probably will have to be VLM specific check through text-config and vision-config, as we can deduce class type through config type. WDYT about it?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qubvel
Copy link
Contributor Author

qubvel commented Jul 25, 2024

@zucchini-nlp If I get it right your concern is regarding the situation

  • vision model doesn't support sdpa
  • text model do support sdpa
  • and we pass attn_implementation="sdpa"

Am I right?

The question here is how should we specify attention implementation, should we dispatch it automatically or let the user make a choice? For example instead of passing attn_implementation="sdpa" make it attn_implementation={"vision_config": "eager", "text_config": "sdpa"}

@zucchini-nlp
Copy link
Member

@qubvel my idea to to dispatch automatically, as that is backwards compatible and I think most VLM-related models support SDPA already. We'll raise a warning if either one doesn't support SDPA, so that the user knows what's happening in the background.

I already have some code for that, so that we infer SDPA flag from config and try to dispatch to whichever that supports it. But that means we'll have to slightly change tests, because tests skip based on uninitialized classes sdpa flag, which will be False by default for all Llavas. Can make a PR soon and let's see if that works for you

@qubvel
Copy link
Contributor Author

qubvel commented Jul 25, 2024

Ok, that sounds good, let's wait until your PR is ready to see if we still need this fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants