Skip to content

Implementation of SDPA for Microsoft Phi-3 Mini #31863

Closed
@Dev4011

Description

@Dev4011

System Info

  • transformers version: 4.42.3
  • Platform: Linux-6.1.85+-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.23.4
  • Safetensors version: 0.4.3
  • Accelerate version: 0.32.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.3.0+cu121 (True)
  • Tensorflow version (GPU?): 2.15.0 (True)
  • Flax version (CPU?/GPU?/TPU?): 0.8.4 (gpu)
  • Jax version: 0.4.26
  • JaxLib version: 0.4.26
  • Using distributed or parallel set-up in script?: Yes
  • Using GPU in script?: Yes
  • GPU type: Tesla T4

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I am referring to the below tutorial to finetune Microsoft Phi3 on my custom dataset: https://github.com/microsoft/Phi-3CookBook/blob/main/code/04.Finetuning/Phi-3-finetune-lora-python.ipynb

As I am doing it on Colab on T4 GPU, the Flash Attention is not supported yet [FlashAttention only supports Ampere GPUs or newer.]

Thus, according to the below code from tutorial, attention_implementation is selected as 'sdpa' with compute datatype as torch.float16

if torch.cuda.is_bf16_supported():
compute_dtype = torch.bfloat16
attn_implementation = 'flash_attention_2'
else:
compute_dtype = torch.float16
attn_implementation = 'sdpa'

Loading Model

model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=compute_dtype, trust_remote_code=True, device_map='auto',
attn_implementation=attn_implementation
)

Error

It gives me the error: ValueError: Phi3ForCausalLM does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. Please request the support for this architecture: #28005.

and keeping attention_implementation='eager' leads to CUDA Out of Memory error.

Expected behavior

SDPA should be supported as an Attention Implementation for Microsoft Phi3 model

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions