Fix attention propagation for vision towers of llava-like models #32221

qubvel · 2024-07-25T11:23:59Z

What does this PR do?

CLIP vision model now supports sdpa attention and automatically dispatch on sdpa if possible. Llava-like models utilize AutoModel.from_config to initialize vision tower, however, attn_implementation was not propagated, which caused errors in CI: the model was in eager mode, and at the same time CLIP vision tower was with sdpa attention.

cc @zucchini-nlp @ydshieh

zucchini-nlp

Thanks for fixing this! Partially related to #30565

The only concern is that the current way we check supports_sdpa as a pretrainedModel's property is not correct. There are two concerns here:

Currently it checks only language_model.supports_sdpa which means that if LM has SDPA and vision tower doesn't, it will fail to assign sdpa to vision
The above holds true, if we trust that the property supports_sdpa works correctly. recently I found that it can't go and check LM's support_sdpa flag because the LM is not initialized at that step. So we don;t know what class will be LM and neither know its flags. Same goes true for vision tower. I am currently working on that, probably will have to be VLM specific check through text-config and vision-config, as we can deduce class type through config type. WDYT about it?

HuggingFaceDocBuilderDev · 2024-07-25T11:46:38Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qubvel · 2024-07-25T12:10:33Z

@zucchini-nlp If I get it right your concern is regarding the situation

vision model doesn't support sdpa
text model do support sdpa
and we pass attn_implementation="sdpa"

Am I right?

The question here is how should we specify attention implementation, should we dispatch it automatically or let the user make a choice? For example instead of passing attn_implementation="sdpa" make it attn_implementation={"vision_config": "eager", "text_config": "sdpa"}

zucchini-nlp · 2024-07-25T12:25:01Z

@qubvel my idea to to dispatch automatically, as that is backwards compatible and I think most VLM-related models support SDPA already. We'll raise a warning if either one doesn't support SDPA, so that the user knows what's happening in the background.

I already have some code for that, so that we infer SDPA flag from config and try to dispatch to whichever that supports it. But that means we'll have to slightly change tests, because tests skip based on uninitialized classes sdpa flag, which will be False by default for all Llavas. Can make a PR soon and let's see if that works for you

qubvel · 2024-07-25T13:41:48Z

Ok, that sounds good, let's wait until your PR is ready to see if we still need this fix!

qubvel added 2 commits July 25, 2024 11:03

Fix llava, llava_next, video_llava, vipllava

201479e

Fix llava_next_video

1dc1992

qubvel added run-slow Vision labels Jul 25, 2024

[run-slow] llava, llava_next, video_llava, vipllava, llava_next_video

6404525

zucchini-nlp reviewed Jul 25, 2024

View reviewed changes

zucchini-nlp mentioned this pull request Jul 26, 2024

Attn implementation for composite models #32238

Merged

qubvel closed this Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix attention propagation for vision towers of llava-like models #32221

Fix attention propagation for vision towers of llava-like models #32221

Uh oh!

qubvel commented Jul 25, 2024 •

edited

Loading

Uh oh!

zucchini-nlp left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Jul 25, 2024

Uh oh!

qubvel commented Jul 25, 2024 •

edited

Loading

Uh oh!

zucchini-nlp commented Jul 25, 2024

Uh oh!

qubvel commented Jul 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix attention propagation for vision towers of llava-like models #32221

Fix attention propagation for vision towers of llava-like models #32221

Uh oh!

Conversation

qubvel commented Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jul 25, 2024

Uh oh!

qubvel commented Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp commented Jul 25, 2024

Uh oh!

qubvel commented Jul 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qubvel commented Jul 25, 2024 •

edited

Loading

qubvel commented Jul 25, 2024 •

edited

Loading