-
Notifications
You must be signed in to change notification settings - Fork 29.8k
Closed
Closed
Copy link
Description
System Info
transformers
version: 4.40.1- Platform: Windows-10-10.0.22631-SP0
- Python version: 3.11.9
- Huggingface_hub version: 0.22.2
- Safetensors version: 0.4.3
- Accelerate version: 0.29.3
- Accelerate config: not found
- PyTorch version (GPU?): 2.3.0+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Who can help?
@Narsil
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
- I pip flash-attention through https://github.com/bdashore3/flash-attention/releases, which provides compiled package for windows. And is_flash_attn_2_available() returns true.
- I run the comfyui workflow https://github.com/ZHO-ZHO-ZHO/ComfyUI-Phi-3-mini, which runs the pipeline of model microsoft/Phi-3-mini-4k-instruct .
- I received "You are not running the flash-attention implementation, expect numerical differences."
- I try to search this message in transformers, but find nothing (Unbelievable!!This is the first question). Then I clone the newest code from github,and I find this message in modeling_phi3.py class Phi3Attention(nn.Module). And I find the transformers installed by pip is different between the transformers in github. The latter have a phi3 folder in model folder. So I try to build transformer by running python setup.py install , but i received
error: [Errno 2] No such file or directory: 'c:\users\79314\anaconda3\envs\comfyuitest\lib\site-packages\transformers-4.41.0.dev0py3.11.egg\transformers\models\deprecated\trajectory_transformer\pycache\convert_trajectory_transformer_original_pytorch_checkpoint_to_pytorch.cpython-311.pyc.1368162759184' - I try my best to use the flash-attention, but I failed.
Expected behavior
How to run with flash-attention?
My curiosity:
Is there another reason limiting flash-attention,except is_flash_attn_2_available()
Why I can't find "You are not running the flash-attention implementation, expect numerical differences." Which file contains these message?
How can I build the latest transformers?
thistleknot
Metadata
Metadata
Assignees
Labels
No labels