[BUG] size mismatch for `tp_size>1` in Llama and Llama2 where `transformer>=4.31.0`

**Describe the bug**
According to https://github.com/huggingface/transformers/commit/07360b6c9c9448d619a82798419ed291dfc6ac8f, new argument `num_key_value_heads` is introduced. However, current `deepspeed` doesn't consider this parameter during tensor parallel.

https://github.com/huggingface/transformers/blob/07360b6c9c9448d619a82798419ed291dfc6ac8f/src/transformers/models/llama/modeling_llama.py#L309-L311

`self.num_heads` is correctly divided by `tp_size` while `self.num_key_value_heads` is not divided, making a size mismatch during `view` method for `key_states` and `value_states`


**Log**  
(tp_size==2)
```
  File "/home/renpang/miniconda3/envs/py311/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 310, in forward
    key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[1, 21, 32, 128]' is invalid for input of size 43008
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] size mismatch for `tp_size>1` in Llama and Llama2 where `transformer>=4.31.0` #4016

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] size mismatch for tp_size>1 in Llama and Llama2 where transformer>=4.31.0 #4016

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[BUG] size mismatch for `tp_size>1` in Llama and Llama2 where `transformer>=4.31.0` #4016