Skip to content

Conversation

@AsbjornOlling
Copy link
Contributor

@AsbjornOlling AsbjornOlling commented Dec 2, 2025

Let's use the standard rust configuration stuff to control what cpu architectures to compile for.
E.g. setting target-cpu=znver4 (for the ryzen chip in my laptop), or target-cpu=native (to compile for whatever the compiling machine's cpu is).

It seems like the behavior prior to this PR was:

  • the native crate feature does nothing(??), except for an edgecase on aarch64 linux
  • rust's target-cpu config is used in rustc as usual
  • GGML_NATIVE is set to the default value selected by llama.cpp (usually ON, but OFF when cross-compiling or when compiling a reproducible build)

That resulted in simple compilations of llama-cpp-rs which contain pure-rust code built for baseline x86-64 with no fancy extensions, and the bundled C++ GGML code built for whatever extensions are available on the compiling machine.

This PR does the following:

  • remove the native feature completely
  • look through CARGO_ENCODED_RUSTFLAGS to pick out the value of target-cpu
  • if target-cpu is "native", enable GGML_NATIVE
  • if target-cpu is anything else:
    • get a cpu feature list from CARGO_CFG_TARGET_FEATURE
    • for each cpu feature, set the corresponding GGML config
    • set CFLAGS and CXXFLAGS to -march=<target-cpu>

After this PR, both the pure rust code and the ggml-cpu code will be built with the same target-cpu configuration. The default target-cpu configuration is a very boring/conservative x86-64. This means that straightforward builds which use no additional configuration used to be built with GGML_NATIVE, and will after this PR be built without it.

Builds that use CPU inference and don't specify a target-cpu will likely become slower.

Hence, this is a breaking change, and it could be discussed whether it's a regression. I believe that the behavior implemented in this PR is correct.

This PR supersedes #868

This PR is WIP, since the resulting binaries seem to still contain AVX2 instructions even when I'm compiling with RUSTFLAGS="-C target-cpu=x86-64". Not sure what's going on, but I'm learning and investigating... solved. the remaining AVX2 instructions turned out to be directly included assembly.

@AsbjornOlling
Copy link
Contributor Author

AsbjornOlling commented Dec 2, 2025

...turns out that the unexpected AVX2 instructions were coming from the ring crate. Ring includes some inline assembly, and selects which implementation to use at runtime, so binaries including ring will always contain AVX2 instructions.

Ring is a dependency of hf-hub. When removing hf-hub from the example, I can compile binaries that contain the x86 extensions that I expect.

@AsbjornOlling AsbjornOlling changed the title Replace native feature with cargo's cpu target configuration Use cargo's cpu target configuration Dec 2, 2025
@AsbjornOlling
Copy link
Contributor Author

This passes the sniff-test for me. The simple binary compiled here is the example from this repo, patched to not include hf-hub.

default (no target-cpu specified, so it defaults to baseline x86-64):

$ cargo build --bin simple --release 2>/dev/null  && elfx86exts ./target/release/simple

File format and CPU architecture: Elf, X86_64
MODE64 (call)
SSE1 (movups)
CMOV (cmovne)
SSE2 (movdqa)
BMI (tzcnt)
Instruction set extensions used: BMI, CMOV, MODE64, SSE1, SSE2
CPU Generation: Haswell

target-cpu=x86-64:

$ RUSTFLAGS="-C target-cpu=x86-64" cargo build --bin simple --release 2>/dev/null && elfx86exts ./target/release/simple

File format and CPU architecture: Elf, X86_64
MODE64 (call)
SSE1 (movups)
CMOV (cmovne)
SSE2 (movdqa)
BMI (tzcnt)
Instruction set extensions used: BMI, CMOV, MODE64, SSE1, SSE2
CPU Generation: Haswell

target-cpu=x86-64-v3:

$ RUSTFLAGS="-C target-cpu=x86-64-v3" cargo build --bin simple --release 2>/dev/null && elfx86exts ./target/release/simple

File format and CPU architecture: Elf, X86_64
MODE64 (call)
AVX (vmovups)
NOVLX (vmovups)
CMOV (cmovne)
BMI2 (shrx)
BMI (tzcnt)
AVX2 (vpbroadcastb)
SSE1 (xorps)
SSE2 (pause)
F16C (vcvtph2ps)
Instruction set extensions used: AVX, AVX2, BMI, BMI2, CMOV, F16C, MODE64, NOVLX, SSE1, SSE2
CPU Generation: Unknown

target-cpu=znver4:

$ RUSTFLAGS="-C target-cpu=znver4" cargo build --bin simple --release 2>/dev/null && elfx86exts ./target/release/simple

File format and CPU architecture: Elf, X86_64
MODE64 (call)
AVX (vmovups)
NOVLX (vmovups)
AVX512 (vmovups)
CMOV (cmovne)
BMI2 (shrx)
BMI (tzcnt)
AVX2 (vpbroadcastq)
SSE1 (xorps)
SSE2 (pause)
VLX (vpord)
BWI (vpbroadcastb)
DQI (vpmullq)
F16C (vcvtph2ps)
Instruction set extensions used: AVX, AVX2, AVX512, BMI, BMI2, BWI, CMOV, DQI, F16C, MODE64, NOVLX, SSE1, SSE2, VLX
CPU Generation: Unknown

target-feature=+avx (default baseline x86 plus AVX):

$ RUSTFLAGS="-C target-feature=+avx" cargo build --bin simple --release 2>/dev/null  && elfx86exts ./target/release/simple

File format and CPU architecture: Elf, X86_64
MODE64 (call)
AVX (vmovups)
NOVLX (vmovups)
CMOV (cmovne)
BMI (tzcnt)
SSE2 (pxor)
SSE1 (movups)
Instruction set extensions used: AVX, BMI, CMOV, MODE64, NOVLX, SSE1, SSE2
CPU Generation: Unknown

@AsbjornOlling
Copy link
Contributor Author

AsbjornOlling commented Dec 2, 2025

Looking at the disassembled code, I can also verify that it is putting AVX512 code in both pure rust code and ggml (when I ask for it), so it is affecting both compilers.

@AsbjornOlling AsbjornOlling changed the title Use cargo's cpu target configuration Use cargo's target-cpu configuration Dec 2, 2025
Copy link
Contributor

@MarcusDunn MarcusDunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic. I'm happy to have the default case regress in exchange for this matching cargo conventions.

@MarcusDunn MarcusDunn marked this pull request as ready for review December 2, 2025 16:01
@AsbjornOlling
Copy link
Contributor Author

I have a few tiny code-cleanlyness changes that I'm going to push in a sec, so you might want to hold off on merging.

@AsbjornOlling AsbjornOlling marked this pull request as draft December 2, 2025 16:35
@AsbjornOlling
Copy link
Contributor Author

Spotted another bug, so I marked this PR as a draft again. I'll ping in here when I think it's ready :)

@AsbjornOlling
Copy link
Contributor Author

AsbjornOlling commented Dec 2, 2025

We could also consider removing this chunk of code, since we're already disabling GGML_NATIVE by default elsewhere, but I'm not entirely sure what the effect of setting GGML_CPU_ARM_ARCH is.

    if matches!(target_os, TargetOs::Linux)
        && target_triple.contains("aarch64")
        && target_cpu != Some("native".into())
    {
        // If the target-cpu is not specified as native, we take off the native ARM64 support.
        // It is useful in docker environments where the native feature is not enabled.
        config.define("GGML_NATIVE", "OFF");
        config.define("GGML_CPU_ARM_ARCH", "armv8-a");
    }

EDIT: meh, since I don't really understand the GGML_CPU_ARM_ARCH variable, and don't have easy access to hardware to test it, I'll just leave it for later.
I think it could potentially cause some problems if someone tries to set target-cpu to some other variant of aarch64 than "armv8-a". but I'd rather avoid breaking something I don't understand 😅

@AsbjornOlling AsbjornOlling marked this pull request as ready for review December 2, 2025 17:29
@AsbjornOlling
Copy link
Contributor Author

AsbjornOlling commented Dec 2, 2025

The ARM stuff might need a revisit in general, actually. The android stuff also allows overriding the -march flag using the ANDROID_ABI env var. I wonder if that's the way it should be.

I might be doing some android/llama.cpp stuff soon enough, so maybe I'll revisit it then. For now I'm happy to see this merged.

@MarcusDunn
Copy link
Contributor

sounds good. I'll merge once tests are green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants