-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
custom 4d attention masks broken by #28937 #29525
Comments
@poedator thank you for opening this issue! The PR linked above should fix it 🙏 |
Hi, @gante , I tested with yesterday's commit 56b64bf and the problem still persists.
proposed solutions: Also the test should be updated to perform more than one forward iteration, to allow testing with cache. I am not attempting a PR here because don't know the greater context of the changes in this part of transformers. Will be glad to test though. Please fix this - I need it to work for my fancy speculative decoding trees (will show you soon). |
I put together a notebook with tests. The one that uses kv cache fails with 4.39.dev but works OK with 4.37.2. I also wrapped it as a new test case, ready to be pasted into transformers/tests/test_modeling_utils. |
System Info
The 4.38.2 version breaks code using custom 4d attention masks (introduced in #27539). Apparently, the custom masks gets replaced here:
transformers/src/transformers/models/llama/modeling_llama.py
Lines 660 to 662 in 4ed9ae6
The issue was introduced with #28937. It is unclear whether the relevant slow tests for 4d masks were run then, but they fail now:
please fix or suggest workaround
summoning @ArthurZucker
cc @gante @younesbelkada
The text was updated successfully, but these errors were encountered: