Attention implementation cannot work together with config in AutoModel #30298

hiyouga · 2024-04-17T17:15:00Z

System Info

transformers version: 4.40.0.dev0
Platform: Linux-5.15.0-100-generic-x86_64-with-glibc2.35
Python version: 3.11.8
Huggingface_hub version: 0.21.4
Safetensors version: 0.4.2
Accelerate version: 0.28.0
PyTorch version (GPU?): 2.2.0+cu121 (True)

Who can help?

@younesbelkada

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Similar to #28038

We want to pass a model config to from_pretrained with an attn_implementation parameter. The attention type cannot be faithful to the one in the attn_implementation

from transformers import AutoConfig, AutoModelForCausalLM
model_name = "meta-llama/Llama-2-7b-hf"
config = AutoConfig.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, config=config, attn_implementation="eager")
print(model.config._attn_implementation)
# sdpa

Expected behavior

_attn_implementation should be eager

The text was updated successfully, but these errors were encountered:

hiyouga · 2024-04-17T17:34:23Z

Given the logic below, we cannot enforce the model to use eager attention, since config._attn_implementation falls back to eager when config._attn_implementation_internal is None [1]. Hence, the if condition config._attn_implementation != kwarg_attn_imp cannot hold, and the config._attn_implementation_internal will be not affected, resulting a SDPA attention [2].

transformers/src/transformers/modeling_utils.py

Lines 3138 to 3150 in e4ea19b

    
           # In case one passes a config to `from_pretrained` + "attn_implementation" 
        
           # override the `_attn_implementation` attribute to `attn_implementation` of the kwargs 
        
           # Please see: https://github.com/huggingface/transformers/issues/28038 
        
           # Overwrite `config._attn_implementation` by the one from the kwargs --> in auto-factory 
        
           # we pop attn_implementation from the kwargs but this handles the case where users 
        
           # passes manually the config to `from_pretrained`. 
        
           config = copy.deepcopy(config) 
        
           kwarg_attn_imp = kwargs.pop("attn_implementation", None) 
        
           if kwarg_attn_imp is not None and config._attn_implementation != kwarg_attn_imp: 
        
               config._attn_implementation = kwarg_attn_imp 
        
           model_kwargs = kwargs

I think we should use config._attn_implementation_internal != kwarg_attn_imp instead

1:

transformers/src/transformers/configuration_utils.py

Lines 406 to 420 in e4ea19b

    
           @property 
        
           def _attn_implementation(self): 
        
               # This property is made private for now (as it cannot be changed and a PreTrainedModel.use_attn_implementation method needs to be implemented.) 
        
               if hasattr(self, "_attn_implementation_internal"): 
        
                   if self._attn_implementation_internal is None: 
        
                       # `config.attn_implementation` should never be None, for backward compatibility. 
        
                       return "eager" 
        
                   else: 
        
                       return self._attn_implementation_internal 
        
               else: 
        
                   return "eager" 
        
           @_attn_implementation.setter 
        
           def _attn_implementation(self, value): 
        
               self._attn_implementation_internal = value

2:

transformers/src/transformers/modeling_utils.py

Lines 1461 to 1466 in e4ea19b

    
           elif requested_attn_implementation in [None, "sdpa"] and not is_torch_xla_available(): 
        
               # use_flash_attention_2 takes priority over SDPA, hence SDPA treated in this elif. 
        
               config = cls._check_and_enable_sdpa( 
        
                   config, 
        
                   hard_check_only=False if requested_attn_implementation is None else True, 
        
               )

amyeroberts · 2024-04-17T19:26:58Z

cc @fxmarty

hiyouga mentioned this issue Apr 17, 2024

Fix config + attn_implementation in AutoModelForCausalLM.from_pretrained #30299

Merged

5 tasks

amyeroberts closed this as completed in #30299 Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attention implementation cannot work together with config in AutoModel #30298

Attention implementation cannot work together with config in AutoModel #30298

hiyouga commented Apr 17, 2024

hiyouga commented Apr 17, 2024

amyeroberts commented Apr 17, 2024

Attention implementation cannot work together with config in AutoModel #30298

Attention implementation cannot work together with config in AutoModel #30298

Comments

hiyouga commented Apr 17, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

hiyouga commented Apr 17, 2024

amyeroberts commented Apr 17, 2024