Commit 163138a
🚨🚨[core] Completely rewrite the masking logic for all attentions (#37866)
* start
* start having a clean 4d mask primitive
* Update mask_utils.py
* Update mask_utils.py
* switch name
* Update masking_utils.py
* add a new AttentionMask tensor class
* fix import
* nits
* fixes
* use full and quandrants
* general sdpa mask for all caches
* style
* start some tests
* tests with sliding, chunked
* add styling
* test hybrid
* Update masking_utils.py
* small temp fixes
* Update modeling_gemma2.py
* compile compatible
* Update masking_utils.py
* improve
* start making it more general
* Update masking_utils.py
* generate
* make it work with flex style primitives!
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* improve
* Update cache_utils.py
* Update masking_utils.py
* simplify - starting to look good!
* Update masking_utils.py
* name
* Update masking_utils.py
* style
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* small fix for flex
* flex compile
* FA2
* Update masking_utils.py
* Escape for TGI/vLLM!
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* General case without cache
* rename
* full test on llama4
* small fix for FA2 guard with chunk
* Update modeling_gemma2.py
* post rebase cleanup
* FA2 supports static cache!
* Update modeling_flash_attention_utils.py
* Update flex_attention.py
* Update masking_utils.py
* Update masking_utils.py
* Update utils.py
* override for export
* Update executorch.py
* Update executorch.py
* Update executorch.py
* Update executorch.py
* Update masking_utils.py
* Update masking_utils.py
* output attentions
* style
* Update masking_utils.py
* Update executorch.py
* Add doicstring
* Add license and put mask visualizer at the end
* Update test_modeling_common.py
* fix broken test
* Update test_modeling_gemma.py
* Update test_modeling_gemma2.py
* Use fullgraph=False with FA2
* Update utils.py
* change name
* Update masking_utils.py
* improve doc
* change name
* Update modeling_attn_mask_utils.py
* more explicit logic based on model's property
* pattern in config
* extend
* fixes
* make it better
* generalize to other test models
* fix
* Update masking_utils.py
* fix
* do not check mask equivalence if layer types are different
* executorch
* Update modeling_gemma2.py
* Update masking_utils.py
* use layer_idx instead
* adjust
* Update masking_utils.py
* test
* fix imports
* Update modeling_gemma2.py
* other test models
* Update modeling_llama4.py
* Update masking_utils.py
* improve
* simplify
* Update masking_utils.py
* typos
* typo
* fix
* Update masking_utils.py
* default DynamicCache
* remove default cache
* simplify
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* simplify
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* export
* Update executorch.py
* Update executorch.py
* Update flex_attention.py
* Update executorch.py
* upstream to modular gemma 1 & 2
* Update modular_mistral.py
* switch names
* use dict
* put it in the Layer directly
* update copy model source for mask functions
* apply so many modular (hopefully 1 shot)
* use explicite dicts for make style happy
* protect import
* check docstring
* better default in hybrid caches
* qwens
* Update modular_qwen2.py
* simplify core logic!
* Update executorch.py
* qwen3 moe
* Update masking_utils.py
* Update masking_utils.py
* simplify a lot sdpa causal skip
* Update masking_utils.py
* post-rebase
* gemma3 finally
* style
* check it before
* gemma3
* More general with newer torch
* align gemma3
* Update utils.py
* Update utils.py
* Update masking_utils.py
* Update test_modeling_common.py
* Update flex_attention.py
* Update flex_attention.py
* Update flex_attention.py
* test
* executorch
* Update test_modeling_common.py
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* Update executorch.py
* Update test_modeling_common.py
* fix copies
* device
* sdpa can be used without mask -> pass the torchscript tests in this case
* Use enum for check
* revert enum and add check instead
* remove broken test
* cohere2
* some doc & reorganize the Interface
* Update tensor_parallel.py
* Update tensor_parallel.py
* doc and dummy
* Update test_modeling_paligemma2.py
* Update modeling_falcon_h1.py
* Update masking_utils.py
* executorch patch
* style
* CIs
* use register in executorch
* final comments!
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>1 parent f8630c7 commit 163138a
File tree
129 files changed
+2984
-6808
lines changed- docs/source/en
- internal
- src/transformers
- generation
- integrations
- models
- aria
- aya_vision
- bart
- bigbird_pegasus
- biogpt
- bitnet
- blenderbot_small
- blenderbot
- bloom
- chameleon
- codegen
- cohere2
- cohere
- csm
- dbrx
- deepseek_v3
- diffllama
- emu3
- falcon_h1
- falcon
- gemma2
- gemma3
- gemma
- glm4
- glm
- got_ocr2
- gpt_neox_japanese
- gpt_neox
- gpt_neo
- gptj
- granitemoe
- granite
- helium
- idefics
- internvl
- jetmoe
- llama4
- llama
- llava_next
- llava
- longt5
- m2m_100
- marian
- mbart
- mimi
- mistral3
- mistral
- mixtral
- mllama
- moonshine
- moshi
- mt5
- nemotron
- olmo2
- olmo
- opt
- paligemma
- pegasus_x
- pegasus
- persimmon
- phi3
- phi4_multimodal
- phimoe
- phi
- pix2struct
- plbart
- pop2piano
- qwen2_5_omni
- qwen2_moe
- qwen2_vl
- qwen2
- qwen3_moe
- qwen3
- stablelm
- starcoder2
- switch_transformers
- t5
- udop
- umt5
- video_llava
- vipllava
- whisper
- utils
- tests
- models
- gemma2
- gemma3
- gemma
- paligemma2
- utils
- utils
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
129 files changed
+2984
-6808
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
125 | 125 | | |
126 | 126 | | |
127 | 127 | | |
128 | | - | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
32 | 37 | | |
33 | 38 | | |
34 | 39 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
445 | 445 | | |
446 | 446 | | |
447 | 447 | | |
| 448 | + | |
448 | 449 | | |
449 | 450 | | |
450 | 451 | | |
| |||
914 | 915 | | |
915 | 916 | | |
916 | 917 | | |
| 918 | + | |
917 | 919 | | |
918 | 920 | | |
919 | 921 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
196 | 196 | | |
197 | 197 | | |
198 | 198 | | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
199 | 211 | | |
200 | 212 | | |
201 | 213 | | |
| |||
1084 | 1096 | | |
1085 | 1097 | | |
1086 | 1098 | | |
1087 | | - | |
1088 | | - | |
1089 | 1099 | | |
1090 | 1100 | | |
1091 | 1101 | | |
| |||
1390 | 1400 | | |
1391 | 1401 | | |
1392 | 1402 | | |
| 1403 | + | |
| 1404 | + | |
| 1405 | + | |
| 1406 | + | |
| 1407 | + | |
| 1408 | + | |
| 1409 | + | |
| 1410 | + | |
| 1411 | + | |
| 1412 | + | |
1393 | 1413 | | |
1394 | 1414 | | |
1395 | 1415 | | |
| |||
1446 | 1466 | | |
1447 | 1467 | | |
1448 | 1468 | | |
1449 | | - | |
1450 | 1469 | | |
1451 | 1470 | | |
1452 | 1471 | | |
| |||
1465 | 1484 | | |
1466 | 1485 | | |
1467 | 1486 | | |
| 1487 | + | |
1468 | 1488 | | |
1469 | 1489 | | |
1470 | 1490 | | |
| |||
1509 | 1529 | | |
1510 | 1530 | | |
1511 | 1531 | | |
| 1532 | + | |
| 1533 | + | |
| 1534 | + | |
| 1535 | + | |
| 1536 | + | |
| 1537 | + | |
| 1538 | + | |
| 1539 | + | |
| 1540 | + | |
| 1541 | + | |
| 1542 | + | |
| 1543 | + | |
| 1544 | + | |
| 1545 | + | |
| 1546 | + | |
1512 | 1547 | | |
1513 | 1548 | | |
1514 | 1549 | | |
| |||
1761 | 1796 | | |
1762 | 1797 | | |
1763 | 1798 | | |
1764 | | - | |
1765 | | - | |
| 1799 | + | |
| 1800 | + | |
| 1801 | + | |
| 1802 | + | |
| 1803 | + | |
| 1804 | + | |
1766 | 1805 | | |
1767 | 1806 | | |
1768 | 1807 | | |
1769 | 1808 | | |
| 1809 | + | |
1770 | 1810 | | |
1771 | 1811 | | |
1772 | 1812 | | |
| |||
1775 | 1815 | | |
1776 | 1816 | | |
1777 | 1817 | | |
1778 | | - | |
| 1818 | + | |
1779 | 1819 | | |
1780 | 1820 | | |
1781 | 1821 | | |
| |||
1796 | 1836 | | |
1797 | 1837 | | |
1798 | 1838 | | |
1799 | | - | |
| 1839 | + | |
1800 | 1840 | | |
1801 | 1841 | | |
1802 | 1842 | | |
| |||
1843 | 1883 | | |
1844 | 1884 | | |
1845 | 1885 | | |
| 1886 | + | |
| 1887 | + | |
| 1888 | + | |
| 1889 | + | |
| 1890 | + | |
| 1891 | + | |
| 1892 | + | |
| 1893 | + | |
| 1894 | + | |
| 1895 | + | |
| 1896 | + | |
| 1897 | + | |
| 1898 | + | |
| 1899 | + | |
| 1900 | + | |
| 1901 | + | |
| 1902 | + | |
| 1903 | + | |
| 1904 | + | |
| 1905 | + | |
1846 | 1906 | | |
1847 | 1907 | | |
1848 | 1908 | | |
| |||
1912 | 1972 | | |
1913 | 1973 | | |
1914 | 1974 | | |
1915 | | - | |
1916 | | - | |
| 1975 | + | |
| 1976 | + | |
| 1977 | + | |
1917 | 1978 | | |
1918 | | - | |
1919 | | - | |
| 1979 | + | |
1920 | 1980 | | |
1921 | 1981 | | |
1922 | 1982 | | |
| |||
1999 | 2059 | | |
2000 | 2060 | | |
2001 | 2061 | | |
2002 | | - | |
2003 | | - | |
2004 | | - | |
2005 | | - | |
2006 | | - | |
| 2062 | + | |
2007 | 2063 | | |
2008 | 2064 | | |
2009 | 2065 | | |
| |||
2038 | 2094 | | |
2039 | 2095 | | |
2040 | 2096 | | |
| 2097 | + | |
| 2098 | + | |
| 2099 | + | |
| 2100 | + | |
| 2101 | + | |
| 2102 | + | |
| 2103 | + | |
| 2104 | + | |
| 2105 | + | |
| 2106 | + | |
| 2107 | + | |
| 2108 | + | |
| 2109 | + | |
| 2110 | + | |
| 2111 | + | |
| 2112 | + | |
| 2113 | + | |
| 2114 | + | |
| 2115 | + | |
| 2116 | + | |
| 2117 | + | |
| 2118 | + | |
| 2119 | + | |
| 2120 | + | |
| 2121 | + | |
| 2122 | + | |
| 2123 | + | |
| 2124 | + | |
| 2125 | + | |
| 2126 | + | |
| 2127 | + | |
2041 | 2128 | | |
2042 | 2129 | | |
2043 | 2130 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1209 | 1209 | | |
1210 | 1210 | | |
1211 | 1211 | | |
| 1212 | + | |
| 1213 | + | |
| 1214 | + | |
| 1215 | + | |
| 1216 | + | |
| 1217 | + | |
| 1218 | + | |
| 1219 | + | |
| 1220 | + | |
| 1221 | + | |
| 1222 | + | |
| 1223 | + | |
| 1224 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
| 49 | + | |
49 | 50 | | |
50 | 51 | | |
51 | 52 | | |
| |||
74 | 75 | | |
75 | 76 | | |
76 | 77 | | |
| 78 | + | |
77 | 79 | | |
78 | 80 | | |
79 | 81 | | |
| |||
649 | 651 | | |
650 | 652 | | |
651 | 653 | | |
| 654 | + | |
| 655 | + | |
652 | 656 | | |
653 | | - | |
654 | | - | |
655 | | - | |
656 | | - | |
657 | | - | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
658 | 670 | | |
659 | 671 | | |
660 | 672 | | |
| |||
3533 | 3545 | | |
3534 | 3546 | | |
3535 | 3547 | | |
| 3548 | + | |
| 3549 | + | |
| 3550 | + | |
| 3551 | + | |
| 3552 | + | |
| 3553 | + | |
| 3554 | + | |
| 3555 | + | |
| 3556 | + | |
| 3557 | + | |
| 3558 | + | |
| 3559 | + | |
| 3560 | + | |
3536 | 3561 | | |
3537 | 3562 | | |
3538 | 3563 | | |
| |||
0 commit comments