Janus flash-attention issues solution

As I was trying to run deepseek-ai's Janus on a Colab notebook, I encountered some flash-attention errors, including the one you mentioned in [installing-flash-attention.md](https://github.com/simonw/til/blob/main/python/installing-flash-attention.md):

>NameError: name '_flash_supports_window_size' is not defined

I couldn't resolve this specific error but managed to get it working by disabling flash-attention entirely for this model.

Fortunately, some guys at Xenova had already addressed this and uploaded a [PR](https://huggingface.co/deepseek-ai/Janus-1.3B/tree/refs%2Fpr%2F7) to Janus's Hugging Face Hub repository. You can use it by specifying the revision `refs/pr/7` when downloading the pretrained model. For example:

```python
import torch
from transformers import AutoModelForCausalLM
from janus.models import MultiModalityCausalLM, VLChatProcessor
from janus.utils.io import load_pil_images

# specify the path to the model
revision_id = "refs/pr/7"
model_path = "deepseek-ai/Janus-1.3B"
vl_chat_processor: VLChatProcessor = VLChatProcessor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer

vl_gpt: MultiModalityCausalLM = AutoModelForCausalLM.from_pretrained(
    model_path, trust_remote_code=True, revision=revision_id
)
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()
```

I just tried it with the official Janus [Colab demo](https://github.com/deepseek-ai/Janus/blob/main/demo/Janus_colab_demo.ipynb), and it worked like a charm! I thought you might appreciate this.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Janus flash-attention issues solution #99

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Janus flash-attention issues solution #99

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions