-
Notifications
You must be signed in to change notification settings - Fork 31.1k
Closed
Description
With:
On:
- CPU
- NVidia A10
Test that "static cache works with torch.export()" fails with:
# RUN_SLOW=1 python3 -m pytest --pspec -vv -k CacheTest tests/utils/test_cache_utils.py
RuntimeError: cannot mutate tensors with frozen storage
While executing %index_copy_ : [num_users=0] = call_method[target=index_copy_](args = (%k_out, 2, %l_input_pos_, %k_embed), kwargs = {})
Original traceback:
File "/home/dvrogozh/git/huggingface/transformers/tests/utils/test_cache_utils.py", line 210, in forward
outs = self.model(
File "/home/dvrogozh/git/huggingface/transformers/src/transformers/models/gemma/modeling_gemma.py", line 1076, in forward
outputs = self.model(
File "/home/dvrogozh/git/huggingface/transformers/src/transformers/models/gemma/modeling_gemma.py", line 889, in forward
layer_outputs = decoder_layer(
File "/home/dvrogozh/git/huggingface/transformers/src/transformers/models/gemma/modeling_gemma.py", line 611, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/dvrogozh/git/huggingface/transformers/src/transformers/models/gemma/modeling_gemma.py", line 521, in forward
key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
File "/home/dvrogozh/git/huggingface/transformers/src/transformers/cache_utils.py", line 1101, in update
k_out.index_copy_(2, cache_position, key_states)
I observe that adding a .clone() to the following 2 tensors does fix the issue. Such solution was suggested in pytorch/pytorch#127571 (comment). However I am not sure whether that's the correct fix. See #33178 draft PR with this change.
transformers/src/transformers/cache_utils.py
Lines 1090 to 1091 in 5c1027b
| k_out = self.key_cache[layer_idx] | |
| v_out = self.value_cache[layer_idx] |
Metadata
Metadata
Assignees
Labels
No labels