-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Closed
Labels
Description
Describe the bug
According to huggingface/transformers@07360b6, new argument num_key_value_heads is introduced. However, current deepspeed doesn't consider this parameter during tensor parallel.
self.num_heads is correctly divided by tp_size while self.num_key_value_heads is not divided, making a size mismatch during view method for key_states and value_states
Log
(tp_size==2)
File "/home/renpang/miniconda3/envs/py311/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 310, in forward
key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[1, 21, 32, 128]' is invalid for input of size 43008