Offload input layer for select models. #17248

pzdunows · 2025-11-13T18:58:13Z

This change improves performance of LFM2 models.

Before:

D:\test>fix_disabled\llama-bench.exe -m LFM2-1.2B-Q8_0.gguf -ngl 100 -t 8
HIP Library Path: D:\test\fix_disabled\amdhip64_7.dll
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| lfm2 1.2B Q8_0                 |   1.16 GiB |     1.17 B | ROCm       | 100 |           pp512 |    13955.14 ± 240.79 |
| lfm2 1.2B Q8_0                 |   1.16 GiB |     1.17 B | ROCm       | 100 |           tg128 |        278.71 ± 5.62 |

build: 017eceed6 (7036)

After:

D:\test>fix_enabled\llama-bench.exe -m LFM2-1.2B-Q8_0.gguf -ngl 100 -t 8
HIP Library Path: D:\test\fix_enabled\amdhip64_7.dll
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| lfm2 1.2B Q8_0                 |   1.16 GiB |     1.17 B | ROCm       | 100 |           pp512 |    14617.30 ± 177.74 |
| lfm2 1.2B Q8_0                 |   1.16 GiB |     1.17 B | ROCm       | 100 |           tg128 |        328.27 ± 7.79 |

build: 017eceed6 (7036)

pzdunows · 2025-11-27T12:19:49Z

Hi @CISC, could you take a look at this change?
Thanks!

CISC · 2025-11-27T12:30:37Z

I think it's better if @ggerganov had a look at this, out of curiosity though do you think there are other models that would benefit from this?

ggerganov · 2025-11-27T12:56:30Z

There were a few issues with this model, think I fixed them here: #17548

pzdunows · 2025-11-27T14:12:00Z

I think it's better if @ggerganov had a look at this, out of curiosity though do you think there are other models that would benefit from this?

Hard to say, I was only looking into this specific model family.

pzdunows · 2025-11-27T14:12:29Z

There were a few issues with this model, think I fixed them here: #17548

Thanks, I'll check it out.

Offload input layer for select models.

85c3e2c

pzdunows requested review from CISC and ggerganov as code owners November 13, 2025 18:58

DajanaV mentioned this pull request Nov 13, 2025

UPSTREAM PR #17248: Offload input layer for select models. auroralabs-loci/llama.cpp#201

Open

ggerganov mentioned this pull request Nov 27, 2025

models : fix LFM2 tensors #17548

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Offload input layer for select models. #17248

Offload input layer for select models. #17248

pzdunows commented Nov 13, 2025 •

edited

Loading

Uh oh!

pzdunows commented Nov 27, 2025

Uh oh!

CISC commented Nov 27, 2025

Uh oh!

ggerganov commented Nov 27, 2025

Uh oh!

pzdunows commented Nov 27, 2025

Uh oh!

pzdunows commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Offload input layer for select models. #17248

Are you sure you want to change the base?

Offload input layer for select models. #17248

Conversation

pzdunows commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pzdunows commented Nov 27, 2025

Uh oh!

CISC commented Nov 27, 2025

Uh oh!

ggerganov commented Nov 27, 2025

Uh oh!

pzdunows commented Nov 27, 2025

Uh oh!

pzdunows commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pzdunows commented Nov 13, 2025 •

edited

Loading