-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of SDPA for Microsoft Phi-3 Mini #31863
Comments
One thing to not is that |
I think this can be changed to
I confirmed that model can be loaded properly after changing that to True .
|
Yep, can you open a PR? 🤗 |
@ArthurZucker Opened a PR! |
Hi, I changed the _supports_sdpa to True, but I still get this error. I am using V100. transformers version 4.42.4. How should I change the code? Thanks |
Oh, I set the trust_remote_code=False and set the _supports_sdpa to True. It works now. |
Thanks for updating! |
System Info
transformers
version: 4.42.3Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I am referring to the below tutorial to finetune Microsoft Phi3 on my custom dataset: https://github.com/microsoft/Phi-3CookBook/blob/main/code/04.Finetuning/Phi-3-finetune-lora-python.ipynb
As I am doing it on Colab on T4 GPU, the Flash Attention is not supported yet [FlashAttention only supports Ampere GPUs or newer.]
Thus, according to the below code from tutorial, attention_implementation is selected as 'sdpa' with compute datatype as torch.float16
if torch.cuda.is_bf16_supported():
compute_dtype = torch.bfloat16
attn_implementation = 'flash_attention_2'
else:
compute_dtype = torch.float16
attn_implementation = 'sdpa'
Loading Model
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=compute_dtype, trust_remote_code=True, device_map='auto',
attn_implementation=attn_implementation
)
Error
It gives me the error: ValueError: Phi3ForCausalLM does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. Please request the support for this architecture: #28005.
and keeping attention_implementation='eager' leads to CUDA Out of Memory error.
Expected behavior
SDPA should be supported as an Attention Implementation for Microsoft Phi3 model
The text was updated successfully, but these errors were encountered: