Commit cbb80aa
models : optimize qwen3next graph (ggml-org#19375)
* models : optimizing qwen3next graph
* cont
* wip
* wip
* wip
* wip
* wip
* wip
* wip
* wip
* wip
* wip
* cont : remove redundant q, g chunking
* minor
* minor
* avoid passing masks around
* avoid concats during chunking
* naming + shapes
* update names and use prefix to disable CUDA graphs1 parent 219d269 commit cbb80aa
4 files changed
Lines changed: 262 additions & 299 deletions
File tree
- ggml/src
- ggml-cuda
- ggml-metal
- src/models
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2872 | 2872 | | |
2873 | 2873 | | |
2874 | 2874 | | |
| 2875 | + | |
2875 | 2876 | | |
2876 | 2877 | | |
2877 | 2878 | | |
| |||
2902 | 2903 | | |
2903 | 2904 | | |
2904 | 2905 | | |
2905 | | - | |
| 2906 | + | |
| 2907 | + | |
2906 | 2908 | | |
2907 | 2909 | | |
2908 | 2910 | | |
| |||
4544 | 4546 | | |
4545 | 4547 | | |
4546 | 4548 | | |
| 4549 | + | |
| 4550 | + | |
4547 | 4551 | | |
4548 | 4552 | | |
4549 | 4553 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
273 | 273 | | |
274 | 274 | | |
275 | 275 | | |
| 276 | + | |
276 | 277 | | |
277 | 278 | | |
278 | 279 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
489 | 489 | | |
490 | 490 | | |
491 | 491 | | |
492 | | - | |
493 | | - | |
494 | | - | |
495 | 492 | | |
496 | 493 | | |
497 | 494 | | |
| |||
506 | 503 | | |
507 | 504 | | |
508 | 505 | | |
509 | | - | |
510 | | - | |
511 | | - | |
512 | 506 | | |
513 | 507 | | |
514 | 508 | | |
| |||
0 commit comments