Skip to content

Correct check for SDPA in Vision Language Models #30565

@zucchini-nlp

Description

@zucchini-nlp

System Info

In current implementation of VLMs, the "_supports_sdpa" attribute checks and activates SDPA attention only for the language model. For example in Llava

It should also check and if available use SDPA attention for vision tower.

We can raise a warning for composite models if only one part support sdpa, but other does not, and activate SDPA for the supported part. That waythe user knows what is happening in the background.

Verified models

  • BLIP-2
  • InstructBLIP
  • InstructBLIPVideo
  • KOSMOS-2
  • LLaVa
  • LLaVa-NeXT
  • LLaVa-NeXT-Video
  • VipLLaVa
  • Video-LLaVa
  • Idefics
  • Idefics2
  • PaliGemma

Metadata

Metadata

Assignees

No one assigned

    Labels

    Should FixThis has been identified as a bug and should be fixed.VisionWIPLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions