Skip to content

Support LLaVA #273

Open
Open
@briankariuki

Description

I'm working on adding LLaVA to bumblebee as a learning exercise.

I need some guidance on a few things:

  1. From the official implementation of LLaVA as seen here , they are using ClipVisionModel from the huggingface transformers package to extract image features. Should I go ahead and reimplement this or just use the existing ClipVisionModel implementation already in bumblebee?
  2. In the implementations there's a params_mapping section. for example for LLaMA here. How do I go about identifying the layers of the model and what they map to in the Axon model?
  3. I would also require some guidance on implementing the core logic of the model.

The transformers package has not added support for LLaVA but there's an ongoing PR that can be found here but has not been merged yet.

Thanks.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions