ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Algorithm #13882

vineelabhinav · 2025-05-29T07:36:32Z

This PR adds SVE kernel support for F32 datatype specific to Mamba Model on ARM architecture.
This PR comes out from #13602 as a separate contribution of only Mamba specific functions suggested by @ggerganov.
Major code changes:

Add SVE support for ggml_compute_forward_ssm_scan_f32() function.

Performance

This PR improves performance by ~1.1x compared to the previous Neon-based implementation.
Model: falcon-mamba-7B-F32.gguf
Command: ./build/bin/llama-bench -m falcon-mamba-7B-F32.gguf -t 8,16,32,64 -p 128,1024 -n 0

Task1: Prompt Length: 128 tokens, Generated Tokens: 1 token

Threads	Neon (Tokens/sec)	SVE (Tokens/sec)	Ratio
8	11.81	12.34	1.04
16	22.36	23.34	1.04
32	39.34	40.85	1.04
64	60.52	62.09	1.03

Task2: Prompt Length: 1024 tokens, Generated Tokens: 1 token

Threads	Neon (Tokens/sec)	SVE (Tokens/sec)	Ratio
8	11.2	11.66	1.04
16	21.13	21.92	1.04
32	37.02	38.44	1.04
64	56.94	58.32	1.02

Perplexity

There is no change in model accuracy as a result of this PR.
Command: ./build/bin/llama-perplexity -s 0 -np 128 -t 64 -m falcon-mamba-7B-F32.gguf -c 128 -b 128 --chunks 16 -f scripts/wikitext-2-raw/wiki.test.raw

NEON	SVE
7.6153 +/- 0.66890	7.6153 +/- 0.66890

Contributor: Vineel Abhinav Gottala

ggerganov

Minor formatting fixes

ggml/src/ggml-cpu/ops.cpp

F32-Mamba-Seq_Scan-SVE

d9e2712

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label May 29, 2025

ggerganov approved these changes May 29, 2025

View reviewed changes

ggml/src/ggml-cpu/ops.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-cpu/ops.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-cpu/ops.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-cpu/ops.cpp Outdated Show resolved Hide resolved

Fix formatting

81c8ace

ggerganov reviewed May 29, 2025

View reviewed changes

ggml/src/ggml-cpu/ops.cpp Outdated Show resolved Hide resolved

ggml : missing space

96164e6

ggerganov merged commit dd8ba93 into ggml-org:master May 29, 2025
2 checks passed

ggerganov mentioned this pull request May 30, 2025

ggml: aarch64: Implement SVE F32 kernels for Mamba Model #13602

Closed

gabe-l-hart mentioned this pull request May 30, 2025

llama : initial Mamba-2 support #9126

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Algorithm #13882

ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Algorithm #13882

Uh oh!

vineelabhinav commented May 29, 2025

Uh oh!

ggerganov left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Algorithm #13882

ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Algorithm #13882

Uh oh!

Conversation

vineelabhinav commented May 29, 2025

Performance

Perplexity

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!