Commit d95c864
authored
π΄π΄π΄ [
* starting attn refactor for encoder decoder models via bart (eager + sdpa)
* flash attention works, remove unnecessary code
* flex attention support for bart!, gotta check if the renaming is not too aggressive
* some comments
* skip flex grad test for standalone as done with the other test
* revert flex attn rename (for now), sdpa simplify, and todos
* more todos
* refactor mask creation for reuse
* modular attempt at biogpt
* first batch of other models
* fix attn dropout
* fix autoformer copies
* hubert
* another batch of models
* copies/style + last round of bart models --> whisper next?
* remove unnecessary _reshape function and remove copy to whisper
* add skip for decoder-only models out of enc-dec (same as in bart)
* bring back licences
* remove comment, added to pr read instead
* mostly docs
* disable sew flex attn as it's unclear attn mask for now
* oops
* test fixes for enc-dec
* torch fx fixes + try at flex attn
* skip on mbart
* some more fixes
* musicgen skip / delete old attn class logic + sdpa compose compile skip
* disable flex attn for musicgen, not worth the effort
* more fixes and style
* flex attention test for dropout and encoder decoder that dont have main input names
* informer fixes
* the weirdest thing I've encountered yet...
* style
* remove empty tensor attempt, found core root in previous commits
* disable time series due to tests being very text centric on inputs
* add speech to text to be ignoring the other attns, also due to tests
* update docs
* remaining issues resolved ?
* update docs for current state --> nllb moe and pegasus x sdpa is questionable :D
* some models have not set the is_causal flag...
* change dtype in softmax tol old behaviour + some modular fixes
* I hate it but it is what it is
* fixes from main for bart
* forgot this one
* some model fixes
* style
* current status
* marian works now
* fixing some copies
* some copy fixes + time series x informer
* last models possibly and fixes on style/copies
* some post merge fixes
* more fixes
* make attention interface callable and move warnings there
* style lol
* add comment to "unsupported"
* remove callable interface and change interface warnings + some copies
* fix
* ternary is ugly af, make it simpler
* how did that happen
* fix flex attn test
* failing the test
* no more fallback! fixing copies next
* style + attn fixed
* fixing copies and mask creation
* wrong copy
* fixup tests and disable flex attn for now
* fixup last tests?Attention] Refactor Attention Interface for Bart-based Models (#38108)1 parent 9895819 commit d95c864
File tree
75 files changed
+8035
-5932
lines changed- docs/source/en/model_doc
- src/transformers
- integrations
- models
- autoformer
- bart
- bigbird_pegasus
- biogpt
- blenderbot_small
- blenderbot
- data2vec
- hubert
- informer
- m2m_100
- marian
- mbart
- musicgen_melody
- musicgen
- nllb_moe
- patchtsmixer
- patchtst
- pegasus_x
- pegasus
- plbart
- sew
- speech_to_text
- time_series_transformer
- unispeech_sat
- unispeech
- wav2vec2_conformer
- wav2vec2
- wavlm
- whisper
- xglm
- tests
- generation
- models
- bart
- blenderbot_small
- blenderbot
- data2vec
- encoder_decoder
- hubert
- m2m_100
- marian
- mbart
- musicgen_melody
- musicgen
- patchtsmixer
- pegasus_x
- pegasus
- plbart
- sew
- speech_encoder_decoder
- unispeech_sat
- unispeech
- vision_encoder_decoder
- wav2vec2
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
75 files changed
+8035
-5932
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
| |||
40 | 41 | | |
41 | 42 | | |
42 | 43 | | |
43 | | - | |
44 | | - | |
45 | | - | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
46 | 47 | | |
47 | 48 | | |
48 | 49 | | |
49 | | - | |
| 50 | + | |
50 | 51 | | |
51 | 52 | | |
52 | 53 | | |
| |||
109 | 110 | | |
110 | 111 | | |
111 | 112 | | |
112 | | - | |
| 113 | + | |
113 | 114 | | |
114 | 115 | | |
115 | 116 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
| 25 | + | |
24 | 26 | | |
25 | 27 | | |
26 | 28 | | |
| |||
52 | 54 | | |
53 | 55 | | |
54 | 56 | | |
55 | | - | |
| 57 | + | |
56 | 58 | | |
57 | 59 | | |
58 | 60 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
| 25 | + | |
24 | 26 | | |
25 | 27 | | |
26 | 28 | | |
| |||
45 | 47 | | |
46 | 48 | | |
47 | 49 | | |
48 | | - | |
| 50 | + | |
49 | 51 | | |
50 | 52 | | |
51 | 53 | | |
| |||
71 | 73 | | |
72 | 74 | | |
73 | 75 | | |
74 | | - | |
| 76 | + | |
75 | 77 | | |
76 | 78 | | |
77 | 79 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
| 25 | + | |
24 | 26 | | |
25 | 27 | | |
26 | 28 | | |
| |||
155 | 157 | | |
156 | 158 | | |
157 | 159 | | |
158 | | - | |
| 160 | + | |
159 | 161 | | |
160 | 162 | | |
161 | 163 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
58 | 58 | | |
59 | 59 | | |
60 | 60 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
| 25 | + | |
24 | 26 | | |
25 | 27 | | |
26 | 28 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
| |||
0 commit comments