-
Notifications
You must be signed in to change notification settings - Fork 250
Add attention backend tests to more-tests.yml #1480
base: main
Are you sure you want to change the base?
Conversation
Tests from pytorch#1477 only, without the generate.py refactor
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1480
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit b52d1b4 with merge base 5684175 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Repro: This is with the current setup, i.e., no code change. The background is that the various accelerated SDPA implementations do not support all parameter combinations and dimensions, so restricting to a subset that does not include MATH can always lead to a fail when the particular input is not supported. (The only implementation guaranteed to handle all inputs is MATH)
|
@yanbing-j @Jack-Khuu current way of setting a single attention implementation fails. Two directions: if we actually want the flash attention, we need to replace the causal mask with is_causal when we specify it (and other masks are not supported by flash at all). That's a prereq to use flash in any form shape or flavor. The second one is how to think about a fallback because non of the sdpa kernels except math is guaranteed to work on all inputs. See #1477 for a solution that puts that backup MATH in there, but it just means that atm whenever we try to call flash we end up with MATH. |
Tests from #1477 only, without the generate.py refactor
(Separate the code changes from the test)