Use cargo's target-cpu configuration #882

AsbjornOlling · 2025-12-02T09:09:13Z

Let's use the standard rust configuration stuff to control what cpu architectures to compile for.
E.g. setting target-cpu=znver4 (for the ryzen chip in my laptop), or target-cpu=native (to compile for whatever the compiling machine's cpu is).

It seems like the behavior prior to this PR was:

the native crate feature does nothing(??), except for an edgecase on aarch64 linux
rust's target-cpu config is used in rustc as usual
GGML_NATIVE is set to the default value selected by llama.cpp (usually ON, but OFF when cross-compiling or when compiling a reproducible build)

That resulted in simple compilations of llama-cpp-rs which contain pure-rust code built for baseline x86-64 with no fancy extensions, and the bundled C++ GGML code built for whatever extensions are available on the compiling machine.

This PR does the following:

remove the native feature completely
look through CARGO_ENCODED_RUSTFLAGS to pick out the value of target-cpu
if target-cpu is "native", enable GGML_NATIVE
if target-cpu is anything else:
- get a cpu feature list from CARGO_CFG_TARGET_FEATURE
- for each cpu feature, set the corresponding GGML config
- set CFLAGS and CXXFLAGS to -march=<target-cpu>

After this PR, both the pure rust code and the ggml-cpu code will be built with the same target-cpu configuration. The default target-cpu configuration is a very boring/conservative x86-64. This means that straightforward builds which use no additional configuration used to be built with GGML_NATIVE, and will after this PR be built without it.

Builds that use CPU inference and don't specify a target-cpu will likely become slower.

Hence, this is a breaking change, and it could be discussed whether it's a regression. I believe that the behavior implemented in this PR is correct.

This PR supersedes #868

This PR is WIP, since the resulting binaries seem to still contain AVX2 instructions even when I'm compiling with RUSTFLAGS="-C target-cpu=x86-64". Not sure what's going on, but I'm learning and investigating... solved. the remaining AVX2 instructions turned out to be directly included assembly.

AsbjornOlling · 2025-12-02T09:46:18Z

...turns out that the unexpected AVX2 instructions were coming from the ring crate. Ring includes some inline assembly, and selects which implementation to use at runtime, so binaries including ring will always contain AVX2 instructions.

Ring is a dependency of hf-hub. When removing hf-hub from the example, I can compile binaries that contain the x86 extensions that I expect.

AsbjornOlling · 2025-12-02T10:43:12Z

This passes the sniff-test for me. The simple binary compiled here is the example from this repo, patched to not include hf-hub.

default (no target-cpu specified, so it defaults to baseline x86-64):

$ cargo build --bin simple --release 2>/dev/null  && elfx86exts ./target/release/simple

File format and CPU architecture: Elf, X86_64
MODE64 (call)
SSE1 (movups)
CMOV (cmovne)
SSE2 (movdqa)
BMI (tzcnt)
Instruction set extensions used: BMI, CMOV, MODE64, SSE1, SSE2
CPU Generation: Haswell

target-cpu=x86-64:

$ RUSTFLAGS="-C target-cpu=x86-64" cargo build --bin simple --release 2>/dev/null && elfx86exts ./target/release/simple

File format and CPU architecture: Elf, X86_64
MODE64 (call)
SSE1 (movups)
CMOV (cmovne)
SSE2 (movdqa)
BMI (tzcnt)
Instruction set extensions used: BMI, CMOV, MODE64, SSE1, SSE2
CPU Generation: Haswell

target-cpu=x86-64-v3:

$ RUSTFLAGS="-C target-cpu=x86-64-v3" cargo build --bin simple --release 2>/dev/null && elfx86exts ./target/release/simple

File format and CPU architecture: Elf, X86_64
MODE64 (call)
AVX (vmovups)
NOVLX (vmovups)
CMOV (cmovne)
BMI2 (shrx)
BMI (tzcnt)
AVX2 (vpbroadcastb)
SSE1 (xorps)
SSE2 (pause)
F16C (vcvtph2ps)
Instruction set extensions used: AVX, AVX2, BMI, BMI2, CMOV, F16C, MODE64, NOVLX, SSE1, SSE2
CPU Generation: Unknown

target-cpu=znver4:

$ RUSTFLAGS="-C target-cpu=znver4" cargo build --bin simple --release 2>/dev/null && elfx86exts ./target/release/simple

File format and CPU architecture: Elf, X86_64
MODE64 (call)
AVX (vmovups)
NOVLX (vmovups)
AVX512 (vmovups)
CMOV (cmovne)
BMI2 (shrx)
BMI (tzcnt)
AVX2 (vpbroadcastq)
SSE1 (xorps)
SSE2 (pause)
VLX (vpord)
BWI (vpbroadcastb)
DQI (vpmullq)
F16C (vcvtph2ps)
Instruction set extensions used: AVX, AVX2, AVX512, BMI, BMI2, BWI, CMOV, DQI, F16C, MODE64, NOVLX, SSE1, SSE2, VLX
CPU Generation: Unknown

target-feature=+avx (default baseline x86 plus AVX):

$ RUSTFLAGS="-C target-feature=+avx" cargo build --bin simple --release 2>/dev/null  && elfx86exts ./target/release/simple

File format and CPU architecture: Elf, X86_64
MODE64 (call)
AVX (vmovups)
NOVLX (vmovups)
CMOV (cmovne)
BMI (tzcnt)
SSE2 (pxor)
SSE1 (movups)
Instruction set extensions used: AVX, BMI, CMOV, MODE64, NOVLX, SSE1, SSE2
CPU Generation: Unknown

AsbjornOlling · 2025-12-02T10:53:51Z

Looking at the disassembled code, I can also verify that it is putting AVX512 code in both pure rust code and ggml (when I ask for it), so it is affecting both compilers.

MarcusDunn

This is fantastic. I'm happy to have the default case regress in exchange for this matching cargo conventions.

AsbjornOlling · 2025-12-02T16:04:30Z

I have a few tiny code-cleanlyness changes that I'm going to push in a sec, so you might want to hold off on merging.

AsbjornOlling · 2025-12-02T16:35:47Z

Spotted another bug, so I marked this PR as a draft again. I'll ping in here when I think it's ready :)

AsbjornOlling · 2025-12-02T16:48:23Z

We could also consider removing this chunk of code, since we're already disabling GGML_NATIVE by default elsewhere, but I'm not entirely sure what the effect of setting GGML_CPU_ARM_ARCH is.

    if matches!(target_os, TargetOs::Linux)
        && target_triple.contains("aarch64")
        && target_cpu != Some("native".into())
    {
        // If the target-cpu is not specified as native, we take off the native ARM64 support.
        // It is useful in docker environments where the native feature is not enabled.
        config.define("GGML_NATIVE", "OFF");
        config.define("GGML_CPU_ARM_ARCH", "armv8-a");
    }

EDIT: meh, since I don't really understand the GGML_CPU_ARM_ARCH variable, and don't have easy access to hardware to test it, I'll just leave it for later.
I think it could potentially cause some problems if someone tries to set target-cpu to some other variant of aarch64 than "armv8-a". but I'd rather avoid breaking something I don't understand 😅

AsbjornOlling · 2025-12-02T17:41:11Z

The ARM stuff might need a revisit in general, actually. The android stuff also allows overriding the -march flag using the ANDROID_ABI env var. I wonder if that's the way it should be.

I might be doing some android/llama.cpp stuff soon enough, so maybe I'll revisit it then. For now I'm happy to see this merged.

MarcusDunn · 2025-12-03T23:29:00Z

sounds good. I'll merge once tests are green.

AsbjornOlling added 2 commits December 1, 2025 16:24

use cpu features configuration from cargo

e01e0d7

pass target cpu to cmake cflags march

711d3e8

AsbjornOlling changed the title ~~Replace native feature with cargo's cpu target configuration~~ Use cargo's cpu target configuration Dec 2, 2025

AsbjornOlling changed the title ~~Use cargo's cpu target configuration~~ Use cargo's target-cpu configuration Dec 2, 2025

remove 'native' feature

9d39309

MarcusDunn approved these changes Dec 2, 2025

View reviewed changes

MarcusDunn marked this pull request as ready for review December 2, 2025 16:01

un-invert target_cpu check on linux/aarch64 builds, clean up code

b08e6ec

AsbjornOlling marked this pull request as draft December 2, 2025 16:35

fix: don't skip 'target-cpu=native' when extracting target_cpu

b078bc9

AsbjornOlling marked this pull request as ready for review December 2, 2025 17:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use cargo's target-cpu configuration #882

Use cargo's target-cpu configuration #882

AsbjornOlling commented Dec 2, 2025 •

edited

Loading

Uh oh!

AsbjornOlling commented Dec 2, 2025 •

edited

Loading

Uh oh!

AsbjornOlling commented Dec 2, 2025

Uh oh!

AsbjornOlling commented Dec 2, 2025 •

edited

Loading

Uh oh!

MarcusDunn left a comment

Uh oh!

AsbjornOlling commented Dec 2, 2025

Uh oh!

AsbjornOlling commented Dec 2, 2025

Uh oh!

AsbjornOlling commented Dec 2, 2025 •

edited

Loading

Uh oh!

AsbjornOlling commented Dec 2, 2025 •

edited

Loading

Uh oh!

MarcusDunn commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use cargo's target-cpu configuration #882

Are you sure you want to change the base?

Use cargo's target-cpu configuration #882

Conversation

AsbjornOlling commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AsbjornOlling commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AsbjornOlling commented Dec 2, 2025

Uh oh!

AsbjornOlling commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarcusDunn left a comment

Choose a reason for hiding this comment

Uh oh!

AsbjornOlling commented Dec 2, 2025

Uh oh!

AsbjornOlling commented Dec 2, 2025

Uh oh!

AsbjornOlling commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AsbjornOlling commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarcusDunn commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AsbjornOlling commented Dec 2, 2025 •

edited

Loading

AsbjornOlling commented Dec 2, 2025 •

edited

Loading

AsbjornOlling commented Dec 2, 2025 •

edited

Loading

AsbjornOlling commented Dec 2, 2025 •

edited

Loading

AsbjornOlling commented Dec 2, 2025 •

edited

Loading