Description
Introduction
At the moment, a Zig target triple (without versions) generally has an obvious 1:1 mapping to a GNU target triple, with only a few exceptions. In this issue, I propose that we completely break with the GNU style of target triple and, in particular, make the choice of libc and ABI two distinct components of the triple. This will enable Zig target triples (then quadruples) to communicate information that they can't today and handle a much wider range of ABI options.
Background & Motivation
Zig target triples, AFAICT, have the goal of completely replacing the -march
, -mcpu
, -mtune
, -mabi
, and -mmacosx-version-min
options with a single, unified -target
option to cover them all. -mcpu
hasn't been integrated in the triple yet, but that work has been accepted in #4584 (and this proposal builds on that one). The only option that remains after that is -mabi
which is notably missing from zig build-obj
and friends, presumably on the assumption that the third component of the triple ought to cover it.
I've spent the past few weeks doing what I think is a fairly exhaustive survey of the ISA and ABI landscape. I put the cutoff point roughly around the mid-1970s; anything prior to that is for all intents and purposes super dead and unlikely to change any of my conclusions here. Having read through ISA manuals and ABI documentation for basically every architecture that I could find a manual for, my conclusion is that there is far more nuance to ABI choice than the current style of target triple allows. This isn't theoretical either; I'll demonstrate some real cases where the current approach falls short in ways that actually matter, and I'll show why the current approach can't scale in the long term.
I'll go over just a few architectures here; there are others that would also illustrate the point well (e.g. SuperH and m68k), but I hope the following are sufficient.
RISC-V
RISC-V is the hot new ISA on the block, so it's probably the most pertinent example here. It currently defines the following ABIs:
ilp32
(full soft float)ilp32f
(softf64
andf128
; hardf32
)ilp32d
(softf128
; hardf32
andf64
)ilp32e
(full soft float; reduced register set)lp64
(full soft float)lp64f
(softf64
andf128
; hardf32
)lp64d
(softf128
; hardf32
andf64
)lp64q
(full hard float)
zig build-obj
and friends simply have no way to select between these ABIs at the moment. We would have to add ABI tags for all of them, and we can already see that adding those for each ABI (plus glibc variants, plus musl variants) is going to get way out of hand. Additionally, I believe there are more ABIs to come; I've heard talk of ilp32ef
, for example, and RV128 ABIs will presumably materialize at some point.
Also worth noting here is that our current strategy of adopting gnueabi
and gnueabihf
(and musl variants) to differentiate hard float vs soft float starts to break down once unusual float ABIs (32-bit and 128-bit) enter the picture.
(Incidentally, these ABIs also show why the current std.Target.FloatAbi
definition is not nuanced enough.)
LoongArch
LoongArch has roughly the same situation as RISC-V, minus the Q and E extensions:
ilp32s
(full soft float)ilp32f
(softf64
; hardf32
)ilp32d
(full hard float)lp64s
(full soft float)lp64f
(softf64
; hardf32
)lp64d
(full hard float)
An interesting thing to note here is that LoongArch is so far the only architecture I'm aware of to have done the sane thing and made the ABI actually, ya know, part of the ABI component of the target triple. So there's -gnu
for ilp32d
/lp64d
, -gnuf32
for ilp32f
/lp64f
, and -gnusf
for ilp32s
/lp64s
. Good job! (It used to be -gnuf64
instead of -gnu
, but they simplified it because it's expected to be the common case.)
PowerPC
PowerPC (or Power ISA) has had a long list of ABIs over the years, being a fairly old architecture. Some never really saw practical use (e.g. the Windows NT ABI). It's a bit hard to categorize the ones that are actually relevant, but I think they can roughly be put like this:
- SVR4
- EABI
- Apple ABI
- ELF v1
- ELF v2
- AIX
(Notably, Zig's current use of the bespoke powerpc-linux-gnueabi(hf)
triples is quite unfortunate because it implies an association with the PowerPC EABI, when that is not actually the case.)
In addition to these broad-strokes ABIs, there are variations based around the definition of long double
. musl has done the simple thing and just declared that only long double = double
is supported. Unfortunately, the rest of the world runs on one of two other definitions - either IEEE binary128 or the 128-bit "double-double" format that IBM came up with. The latter is unfortunately still very common, with binary128 only seeing limited use on newer powerpc64le
distros (see #20579 for details). That last point is a real problem for Zig; zig build-obj
and friends default to the "double-double" format with no way to switch to binary128 for the distros where this is the default.
But wait - there's more. Some of the above ABIs also have vector variants for efficient AltiVec usage. And of course there are also soft float variations of some of them.
And it gets much worse. You can also use a plethora of options such as -malign-natural
, -malign-power
, -maix-struct-return
, and msvr4-struct-return
to explicitly override various aspects of the aforementioned ABIs.
There's a near-incomprehensible number of possible combinations here.
MIPS
MIPS sits somewhere between RISC-V and PowerPC, with a lot of ABIs and configurability within those ABIs:
- EABI
- O32
- N32
- O64
- N64
(MIPS support in Zig currently has the same issue as PowerPC where gnueabi
/gnueabihf
are used to distinguish soft float and hard float, despite neither being EABI-based.)
In addition to these, there are soft float variants and single/double-precision variants. There are also the FPXX and FP64A variants. Like PowerPC, options abound for overriding various aspects of the chosen ABI.
Miscellaneous
Some extra notes that apply to various architectures on top of what I've already written above:
- ILP32 ABI variants (i.e. 32-bit pointers on a 64-bit machine) exist for at least Arm64, Itanium, and x86-64 (in addition to the aforementioned architectures). Note how they're called different things: x32 for x86-64,
gnu_ilp32
(oraarch64_32
as arch) for Arm64, and N32 for MIPS. - Quite a few architectures (RISC-V, Arm, SuperH, and some others I forget) have an FDPIC ABI that is meant for systems without an MMU. I believe it functions as an addition to the base ABI, but I haven't dug too deeply into it.
- It's probably obvious at this point, but just to make it explicit: GNU triples simply cannot represent most of the ABI nuance I've described so far.
- If we ever add more libcs than the ones we have now (ziglibc?), status quo will start to get ugly fast, even putting aside everything else I've brought up here.
Proposal
Hopefully I managed to convincingly get the point across that the current target triple format is not scalable enough for the task it's meant to achieve. Now I'll describe my idea for fixing this situation and future-proofing std.Target
.
Most importantly, the triple should be replaced with a quadruple. Concretely, I'm proposing that it should now be of the form:
<arch>[.<cpu>[+~feats]]-<os>[.<ver>][-<api>[.<ver>][-<abi>[+~opts]]]
(Where api
is basically libc.)
You must now specify: Neither API nor ABI, only API, or both API and ABI.
Some API tags that I anticipate us recognizing would be:
system
: A special tag used for platforms where there is only and can only be one libc (e.g. libSystem on macOS).none
: A special tag for use if you don't want a libc at all (e.g. for thefreestanding
OS tag).mingw
: Uses UCRT via MinGW-w64.msvcrt
: Uses UCRT and VCRuntime natively (i.e. requires MSVC / Windows SDK tooling).gnu
andmusl
: glibc and musl respectively.wasi
: wasi-libc (modified musl).
The ABI value is more complicated. In order to represent all the nuance necessary, it really needs to be treated in the same way that CPU model + features are. That is, you specify a base ABI and optionally add or subtract options, with the same +
and ~
syntax used for CPU features. The available ABIs and options are determined by the selected architecture, just like CPU model and features.
If API and/or ABI are omitted, resolution works mostly as it does today. However, importantly, target resolution is augmented to pick sensible defaults for the ABI based on architecture, OS, and API choices. For example, if you specify powerpc64-linux-musl
, elfv2
will be selected as the base ABI, as opposed to elfv1
for powerpc64-linux-gnu
. This is because musl
by definition requires elfv2
. In addition to this, the base ABI also has sensible default options. In the aforementioned example, the complete resulting ABI is actually elfv2+ldbl64
because musl
also mandates long double = double
. If for some reason you want to override that, you could use elfv2~ldbl64
as the ABI component, but this is expected to be a rare need.
Examples
Here's roughly how each triple in zig targets | jq -r .libc[] | sort
would look post-proposal:
aarch64_be-linux-gnu
->aarch64_be-linux-gnu-lp64
aarch64_be-linux-musl
->aarch64_be-linux-musl-lp64
aarch64-linux-gnu
->aarch64-linux-gnu-lp64
aarch64-linux-musl
->aarch64-linux-musl-lp64
aarch64-macos-none
->aarch64-macos-system-lp64
aarch64-windows-gnu
->aarch64-windows-mingw-lp64
(or-msvcrt-lp64+win
)armeb-linux-gnueabi
->armeb-linux-gnu-eabi+sf
armeb-linux-gnueabihf
->armeb-linux-gnu-eabi
armeb-linux-musleabi
->armeb-linux-musl-eabi+sf
armeb-linux-musleabihf
->armeb-linux-musl-eabi
arm-linux-gnueabi
->arm-linux-gnu-eabi+sf
arm-linux-gnueabihf
->arm-linux-gnu-eabi
arm-linux-musleabi
->arm-linux-musl-eabi+sf
arm-linux-musleabihf
->arm-linux-musl-eabi
arm-windows-gnu
->arm-windows-mingw-eabi
(or-msvcrt-eabi+win
)csky-linux-gnueabi
->csky-linux-gnu-abiv2+sf
csky-linux-gnueabihf
->csky-linux-gnu-abiv2
loongarch64-linux-gnu
->loongarch64-linux-gnu-lp64d
loongarch64-linux-musl
->loongarch64-linux-musl-lp64d
m68k-linux-gnu
->m68k-linux-gnu-gnu
(not to be confused with-sysv
which is older and different!)m68k-linux-musl
->m68k-linux-musl-gnu
(likewise)mips64el-linux-gnuabi64
->mips64el-linux-gnu-n64
mips64el-linux-gnuabin32
->mips64el-linux-gnu-n32
mips64el-linux-musl
->mips64el-linux-musl-n64
mips64-linux-gnuabi64
->mips64-linux-gnu-n64
mips64-linux-gnuabin32
->mips64-linux-gnu-n32
mips64-linux-musl
->mips64-linux-musl-n64
mipsel-linux-gnueabi
->mipsel-linux-gnu-o32+sf
mipsel-linux-gnueabihf
->mipsel-linux-gnu-o32
mipsel-linux-musl
->mipsel-linux-musl-o32
mips-linux-gnueabi
->mips-linux-gnu-o32+sf
(see my notes on EABI above)mips-linux-gnueabihf
->mips-linux-gnu-o32
(likewise)mips-linux-musl
->mips-linux-musl-o32
powerpc64le-linux-gnu
->powerpc64le-linux-gnu-elfv2
powerpc64le-linux-musl
->powerpc64le-linux-musl-elfv2+ldbl64
powerpc64-linux-gnu
->powerpc64-linux-gnu-elfv1
powerpc64-linux-musl
->powerpc64-linux-musl-elfv2+ldbl64
powerpc-linux-gnueabi
->powerpc-linux-gnu-svr4+sf
(see my notes on EABI above)powerpc-linux-gnueabihf
->powerpc-linux-gnu-svr4
(likewise)powerpc-linux-musl
->powerpc-linux-musl-svr4+ldbl64+secplt
riscv32-linux-gnuilp32
->riscv32-linux-gnu-ilp32d
(yes, this one was confusingly named)riscv32-linux-musl
->riscv32-linux-musl-ilp32d
riscv64-linux-gnu
->riscv64-linux-gnu-lp64d
riscv64-linux-musl
->riscv64-linux-musl-lp64d
s390x-linux-gnu
->s390x-linux-gnu-elf
s390x-linux-musl
->s390x-linux-musl-elf
sparc64-linux-gnu
->sparc64-linux-gnu-sysv
sparc-linux-gnu
->sparc-linux-gnu-sysv
thumb-linux-gnueabi
->thumb-linux-gnu-eabi+sf
thumb-linux-gnueabihf
->thumb-linux-gnu-eabi
thumb-linux-musleabi
->thumb-linux-musl-eabi+sf
thumb-linux-musleabihf
->thumb-linux-musl-eabi
wasm32-freestanding-musl
->wasm32-freestanding-none-watc
(WebAssembly Tool Conventions)wasm32-wasi-musl
->wasm32-wasi-wasi-watc
(likewise)x86_64-linux-gnu
->x86_64-linux-gnu-sysv
x86_64-linux-gnux32
->x86_64-linux-gnu-x32
x86_64-linux-musl
->x86_64-linux-musl-sysv
x86_64-macos-none
->x86_64-macos-system-sysv
x86_64-windows-gnu
->x86_64-windows-mingw-sysv
(or-msvcrt-win64
)x86-linux-gnu
->x86-linux-gnu-sysv
x86-linux-musl
->x86-linux-musl-sysv
x86-windows-gnu
->x86-windows-mingw-sysv
(or-msvcrt-win32
)
Note that many triples are still missing here; this is just intended to give a rough idea of how things will look. Also, some of these names would certainly be subject to change and/or bikeshedding during implementation.
Anticipated Concerns
I just want to address upfront some (reasonable!) concerns that I'm almost certain will be on people's minds after reading this:
- libc and ABI choice are linked: This is true, as any given libc only supports certain ABIs. But note that I could make the exact same argument for architecture and OS, OS and combined libc + ABI, etc. This argument would be flawed for (at least) the same reason it is here: Combinatorial explosion. Additionally, note that some architecture/OS combinations also impose restrictions on the ABI. The reality is that every component of a target triple imposes semantic restrictions on the other components, even prior to this proposal. Finally, some ABI options actually are completely independent of libc.
- This is more complicated than before: As I've hopefully demonstrated, this is only so because the status quo is unscalable and, in a case like
long double
on PowerPC, downright unworkable. It's easy to keep things simple if you don't account for all cases. (Quotingzig zen
: Edge cases matter.) Also, writing code for real hardware is necessarily more complex than e.g. a bytecode VM. I think it's actually entirely reasonable to ask that people understand the basics of their target environment, especially in cross-compilation scenarios. The behavior of picking sensible ABI defaults based on other components should help here. For native builds, things stay simple. - Deviating from GNU triples this much will add a learning curve: I simply think the benefits outweigh the costs here. Andrew has stated before that Zig wants to be able to target a much wider variety of platforms than, say, LLVM - including old ones. I think this proposal (or something equivalent, at least) is a clear prerequisite for that goal. Also, just because GNU triples have become ubiquitous, it does not follow that they are good. They are in fact not good for a long list of reasons. (Quoting
zig zen
again: Avoid local maximums.)