Skip to content

Janus flash-attention issues solution #99

Open
@oliveirabruno01

Description

@oliveirabruno01

As I was trying to run deepseek-ai's Janus on a Colab notebook, I encountered some flash-attention errors, including the one you mentioned in installing-flash-attention.md:

NameError: name '_flash_supports_window_size' is not defined

I couldn't resolve this specific error but managed to get it working by disabling flash-attention entirely for this model.

Fortunately, some guys at Xenova had already addressed this and uploaded a PR to Janus's Hugging Face Hub repository. You can use it by specifying the revision refs/pr/7 when downloading the pretrained model. For example:

import torch
from transformers import AutoModelForCausalLM
from janus.models import MultiModalityCausalLM, VLChatProcessor
from janus.utils.io import load_pil_images

# specify the path to the model
revision_id = "refs/pr/7"
model_path = "deepseek-ai/Janus-1.3B"
vl_chat_processor: VLChatProcessor = VLChatProcessor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer

vl_gpt: MultiModalityCausalLM = AutoModelForCausalLM.from_pretrained(
    model_path, trust_remote_code=True, revision=revision_id
)
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()

I just tried it with the official Janus Colab demo, and it worked like a charm! I thought you might appreciate this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions