flash-attention is not running, although is_flash_attn_2_available() returns true

### System Info

- `transformers` version: 4.40.1
- Platform: Windows-10-10.0.22631-SP0
- Python version: 3.11.9
- Huggingface_hub version: 0.22.2
- Safetensors version: 0.4.3
- Accelerate version: 0.29.3
- Accelerate config:    not found
- PyTorch version (GPU?): 2.3.0+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

### Who can help?
@Narsil
_No response_

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

1.  I pip flash-attention through [https://github.com/bdashore3/flash-attention/releases](url)， which provides compiled package for windows.  And is_flash_attn_2_available() returns true.
2.  I run the comfyui workflow [https://github.com/ZHO-ZHO-ZHO/ComfyUI-Phi-3-mini](url), which runs the pipeline of model microsoft/Phi-3-mini-4k-instruct .
3. I received "You are not running the flash-attention implementation, expect numerical differences." 
4. I try to search this message in transformers, but find nothing （Unbelievable！！**This is the first question**）.  Then I clone the newest code from github，and I find this message in **modeling_phi3.py class Phi3Attention(nn.Module)**. And I find the transformers installed by pip is different between the transformers in github. The latter have a phi3 folder in model folder. So I try to build transformer by running python setup.py install , but i received 
error: [Errno 2] No such file or directory: 'c:\\users\\79314\\anaconda3\\envs\\comfyuitest\\lib\\site-packages\\transformers-4.41.0.dev0py3.11.egg\\transformers\\models\\deprecated\\trajectory_transformer\\__pycache__\\**convert_trajectory_transformer_original_pytorch_checkpoint_to_pytorch.cpython-311.pyc.1368162759184**'
5. I try my best to use the flash-attention, but I failed. 

### Expected behavior

How to run with flash-attention？
My curiosity：
Is there another reason limiting flash-attention，except is_flash_attn_2_available() 
Why I can't find "You are not running the flash-attention implementation, expect numerical differences." Which file contains these message?
How can I build the latest transformers?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

flash-attention is not running, although is_flash_attn_2_available() returns true #30547

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

flash-attention is not running, although is_flash_attn_2_available() returns true #30547

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions