-
Notifications
You must be signed in to change notification settings - Fork 123
Use cargo's target-cpu configuration #882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Use cargo's target-cpu configuration #882
Conversation
|
...turns out that the unexpected AVX2 instructions were coming from the ring crate. Ring includes some inline assembly, and selects which implementation to use at runtime, so binaries including ring will always contain AVX2 instructions. Ring is a dependency of hf-hub. When removing |
native feature with cargo's cpu target configuration|
This passes the sniff-test for me. The default (no target-cpu specified, so it defaults to baseline x86-64): $ cargo build --bin simple --release 2>/dev/null && elfx86exts ./target/release/simple
File format and CPU architecture: Elf, X86_64
MODE64 (call)
SSE1 (movups)
CMOV (cmovne)
SSE2 (movdqa)
BMI (tzcnt)
Instruction set extensions used: BMI, CMOV, MODE64, SSE1, SSE2
CPU Generation: Haswell
$ RUSTFLAGS="-C target-cpu=x86-64" cargo build --bin simple --release 2>/dev/null && elfx86exts ./target/release/simple
File format and CPU architecture: Elf, X86_64
MODE64 (call)
SSE1 (movups)
CMOV (cmovne)
SSE2 (movdqa)
BMI (tzcnt)
Instruction set extensions used: BMI, CMOV, MODE64, SSE1, SSE2
CPU Generation: Haswell
$ RUSTFLAGS="-C target-cpu=x86-64-v3" cargo build --bin simple --release 2>/dev/null && elfx86exts ./target/release/simple
File format and CPU architecture: Elf, X86_64
MODE64 (call)
AVX (vmovups)
NOVLX (vmovups)
CMOV (cmovne)
BMI2 (shrx)
BMI (tzcnt)
AVX2 (vpbroadcastb)
SSE1 (xorps)
SSE2 (pause)
F16C (vcvtph2ps)
Instruction set extensions used: AVX, AVX2, BMI, BMI2, CMOV, F16C, MODE64, NOVLX, SSE1, SSE2
CPU Generation: Unknown
$ RUSTFLAGS="-C target-cpu=znver4" cargo build --bin simple --release 2>/dev/null && elfx86exts ./target/release/simple
File format and CPU architecture: Elf, X86_64
MODE64 (call)
AVX (vmovups)
NOVLX (vmovups)
AVX512 (vmovups)
CMOV (cmovne)
BMI2 (shrx)
BMI (tzcnt)
AVX2 (vpbroadcastq)
SSE1 (xorps)
SSE2 (pause)
VLX (vpord)
BWI (vpbroadcastb)
DQI (vpmullq)
F16C (vcvtph2ps)
Instruction set extensions used: AVX, AVX2, AVX512, BMI, BMI2, BWI, CMOV, DQI, F16C, MODE64, NOVLX, SSE1, SSE2, VLX
CPU Generation: Unknown
|
|
Looking at the disassembled code, I can also verify that it is putting AVX512 code in both pure rust code and ggml (when I ask for it), so it is affecting both compilers. |
MarcusDunn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic. I'm happy to have the default case regress in exchange for this matching cargo conventions.
|
I have a few tiny code-cleanlyness changes that I'm going to push in a sec, so you might want to hold off on merging. |
|
Spotted another bug, so I marked this PR as a draft again. I'll ping in here when I think it's ready :) |
|
We could also consider removing this chunk of code, since we're already disabling if matches!(target_os, TargetOs::Linux)
&& target_triple.contains("aarch64")
&& target_cpu != Some("native".into())
{
// If the target-cpu is not specified as native, we take off the native ARM64 support.
// It is useful in docker environments where the native feature is not enabled.
config.define("GGML_NATIVE", "OFF");
config.define("GGML_CPU_ARM_ARCH", "armv8-a");
}EDIT: meh, since I don't really understand the |
|
The ARM stuff might need a revisit in general, actually. The android stuff also allows overriding the I might be doing some android/llama.cpp stuff soon enough, so maybe I'll revisit it then. For now I'm happy to see this merged. |
|
sounds good. I'll merge once tests are green. |
Let's use the standard rust configuration stuff to control what cpu architectures to compile for.
E.g. setting
target-cpu=znver4(for the ryzen chip in my laptop), ortarget-cpu=native(to compile for whatever the compiling machine's cpu is).It seems like the behavior prior to this PR was:
nativecrate feature does nothing(??), except for an edgecase on aarch64 linuxtarget-cpuconfig is used in rustc as usualGGML_NATIVEis set to the default value selected by llama.cpp (usuallyON, butOFFwhen cross-compiling or when compiling a reproducible build)That resulted in simple compilations of llama-cpp-rs which contain pure-rust code built for baseline x86-64 with no fancy extensions, and the bundled C++ GGML code built for whatever extensions are available on the compiling machine.
This PR does the following:
nativefeature completelyCARGO_ENCODED_RUSTFLAGSto pick out the value oftarget-cputarget-cpuis"native", enableGGML_NATIVEtarget-cpuis anything else:CARGO_CFG_TARGET_FEATUREGGMLconfigCFLAGSandCXXFLAGSto-march=<target-cpu>After this PR, both the pure rust code and the ggml-cpu code will be built with the same
target-cpuconfiguration. The defaulttarget-cpuconfiguration is a very boring/conservativex86-64. This means that straightforward builds which use no additional configuration used to be built withGGML_NATIVE, and will after this PR be built without it.Builds that use CPU inference and don't specify a
target-cpuwill likely become slower.Hence, this is a breaking change, and it could be discussed whether it's a regression. I believe that the behavior implemented in this PR is correct.
This PR supersedes #868
This PR is WIP, since the resulting binaries seem to still containsolved. the remaining AVX2 instructions turned out to be directly included assembly.AVX2instructions even when I'm compiling withRUSTFLAGS="-C target-cpu=x86-64". Not sure what's going on, but I'm learning and investigating...