Skip to content

RFC/Proposal: Turning Zig target triples into quadruples #20690

Open
@alexrp

Description

@alexrp

Introduction

At the moment, a Zig target triple (without versions) generally has an obvious 1:1 mapping to a GNU target triple, with only a few exceptions. In this issue, I propose that we completely break with the GNU style of target triple and, in particular, make the choice of libc and ABI two distinct components of the triple. This will enable Zig target triples (then quadruples) to communicate information that they can't today and handle a much wider range of ABI options.

Background & Motivation

Zig target triples, AFAICT, have the goal of completely replacing the -march, -mcpu, -mtune, -mabi, and -mmacosx-version-min options with a single, unified -target option to cover them all. -mcpu hasn't been integrated in the triple yet, but that work has been accepted in #4584 (and this proposal builds on that one). The only option that remains after that is -mabi which is notably missing from zig build-obj and friends, presumably on the assumption that the third component of the triple ought to cover it.

I've spent the past few weeks doing what I think is a fairly exhaustive survey of the ISA and ABI landscape. I put the cutoff point roughly around the mid-1970s; anything prior to that is for all intents and purposes super dead and unlikely to change any of my conclusions here. Having read through ISA manuals and ABI documentation for basically every architecture that I could find a manual for, my conclusion is that there is far more nuance to ABI choice than the current style of target triple allows. This isn't theoretical either; I'll demonstrate some real cases where the current approach falls short in ways that actually matter, and I'll show why the current approach can't scale in the long term.

I'll go over just a few architectures here; there are others that would also illustrate the point well (e.g. SuperH and m68k), but I hope the following are sufficient.

RISC-V

RISC-V is the hot new ISA on the block, so it's probably the most pertinent example here. It currently defines the following ABIs:

  • ilp32 (full soft float)
  • ilp32f (soft f64 and f128; hard f32)
  • ilp32d (soft f128; hard f32 and f64)
  • ilp32e (full soft float; reduced register set)
  • lp64 (full soft float)
  • lp64f (soft f64 and f128; hard f32)
  • lp64d (soft f128; hard f32 and f64)
  • lp64q (full hard float)

zig build-obj and friends simply have no way to select between these ABIs at the moment. We would have to add ABI tags for all of them, and we can already see that adding those for each ABI (plus glibc variants, plus musl variants) is going to get way out of hand. Additionally, I believe there are more ABIs to come; I've heard talk of ilp32ef, for example, and RV128 ABIs will presumably materialize at some point.

Also worth noting here is that our current strategy of adopting gnueabi and gnueabihf (and musl variants) to differentiate hard float vs soft float starts to break down once unusual float ABIs (32-bit and 128-bit) enter the picture.

(Incidentally, these ABIs also show why the current std.Target.FloatAbi definition is not nuanced enough.)

LoongArch

LoongArch has roughly the same situation as RISC-V, minus the Q and E extensions:

  • ilp32s (full soft float)
  • ilp32f (soft f64; hard f32)
  • ilp32d (full hard float)
  • lp64s (full soft float)
  • lp64f (soft f64; hard f32)
  • lp64d (full hard float)

An interesting thing to note here is that LoongArch is so far the only architecture I'm aware of to have done the sane thing and made the ABI actually, ya know, part of the ABI component of the target triple. So there's -gnu for ilp32d/lp64d, -gnuf32 for ilp32f/lp64f, and -gnusf for ilp32s/lp64s. Good job! (It used to be -gnuf64 instead of -gnu, but they simplified it because it's expected to be the common case.)

PowerPC

PowerPC (or Power ISA) has had a long list of ABIs over the years, being a fairly old architecture. Some never really saw practical use (e.g. the Windows NT ABI). It's a bit hard to categorize the ones that are actually relevant, but I think they can roughly be put like this:

  • SVR4
  • EABI
  • Apple ABI
  • ELF v1
  • ELF v2
  • AIX

(Notably, Zig's current use of the bespoke powerpc-linux-gnueabi(hf) triples is quite unfortunate because it implies an association with the PowerPC EABI, when that is not actually the case.)

In addition to these broad-strokes ABIs, there are variations based around the definition of long double. musl has done the simple thing and just declared that only long double = double is supported. Unfortunately, the rest of the world runs on one of two other definitions - either IEEE binary128 or the 128-bit "double-double" format that IBM came up with. The latter is unfortunately still very common, with binary128 only seeing limited use on newer powerpc64le distros (see #20579 for details). That last point is a real problem for Zig; zig build-obj and friends default to the "double-double" format with no way to switch to binary128 for the distros where this is the default.

But wait - there's more. Some of the above ABIs also have vector variants for efficient AltiVec usage. And of course there are also soft float variations of some of them.

And it gets much worse. You can also use a plethora of options such as -malign-natural, -malign-power, -maix-struct-return, and msvr4-struct-return to explicitly override various aspects of the aforementioned ABIs.

There's a near-incomprehensible number of possible combinations here.

MIPS

MIPS sits somewhere between RISC-V and PowerPC, with a lot of ABIs and configurability within those ABIs:

  • EABI
  • O32
  • N32
  • O64
  • N64

(MIPS support in Zig currently has the same issue as PowerPC where gnueabi/gnueabihf are used to distinguish soft float and hard float, despite neither being EABI-based.)

In addition to these, there are soft float variants and single/double-precision variants. There are also the FPXX and FP64A variants. Like PowerPC, options abound for overriding various aspects of the chosen ABI.

Miscellaneous

Some extra notes that apply to various architectures on top of what I've already written above:

  • ILP32 ABI variants (i.e. 32-bit pointers on a 64-bit machine) exist for at least Arm64, Itanium, and x86-64 (in addition to the aforementioned architectures). Note how they're called different things: x32 for x86-64, gnu_ilp32 (or aarch64_32 as arch) for Arm64, and N32 for MIPS.
  • Quite a few architectures (RISC-V, Arm, SuperH, and some others I forget) have an FDPIC ABI that is meant for systems without an MMU. I believe it functions as an addition to the base ABI, but I haven't dug too deeply into it.
  • It's probably obvious at this point, but just to make it explicit: GNU triples simply cannot represent most of the ABI nuance I've described so far.
  • If we ever add more libcs than the ones we have now (ziglibc?), status quo will start to get ugly fast, even putting aside everything else I've brought up here.

Proposal

Hopefully I managed to convincingly get the point across that the current target triple format is not scalable enough for the task it's meant to achieve. Now I'll describe my idea for fixing this situation and future-proofing std.Target.

Most importantly, the triple should be replaced with a quadruple. Concretely, I'm proposing that it should now be of the form:

<arch>[.<cpu>[+~feats]]-<os>[.<ver>][-<api>[.<ver>][-<abi>[+~opts]]]

(Where api is basically libc.)

You must now specify: Neither API nor ABI, only API, or both API and ABI.

Some API tags that I anticipate us recognizing would be:

  • system: A special tag used for platforms where there is only and can only be one libc (e.g. libSystem on macOS).
  • none: A special tag for use if you don't want a libc at all (e.g. for the freestanding OS tag).
  • mingw: Uses UCRT via MinGW-w64.
  • msvcrt: Uses UCRT and VCRuntime natively (i.e. requires MSVC / Windows SDK tooling).
  • gnu and musl: glibc and musl respectively.
  • wasi: wasi-libc (modified musl).

The ABI value is more complicated. In order to represent all the nuance necessary, it really needs to be treated in the same way that CPU model + features are. That is, you specify a base ABI and optionally add or subtract options, with the same + and ~ syntax used for CPU features. The available ABIs and options are determined by the selected architecture, just like CPU model and features.

If API and/or ABI are omitted, resolution works mostly as it does today. However, importantly, target resolution is augmented to pick sensible defaults for the ABI based on architecture, OS, and API choices. For example, if you specify powerpc64-linux-musl, elfv2 will be selected as the base ABI, as opposed to elfv1 for powerpc64-linux-gnu. This is because musl by definition requires elfv2. In addition to this, the base ABI also has sensible default options. In the aforementioned example, the complete resulting ABI is actually elfv2+ldbl64 because musl also mandates long double = double. If for some reason you want to override that, you could use elfv2~ldbl64 as the ABI component, but this is expected to be a rare need.

Examples

Here's roughly how each triple in zig targets | jq -r .libc[] | sort would look post-proposal:

  • aarch64_be-linux-gnu -> aarch64_be-linux-gnu-lp64
  • aarch64_be-linux-musl -> aarch64_be-linux-musl-lp64
  • aarch64-linux-gnu -> aarch64-linux-gnu-lp64
  • aarch64-linux-musl -> aarch64-linux-musl-lp64
  • aarch64-macos-none -> aarch64-macos-system-lp64
  • aarch64-windows-gnu -> aarch64-windows-mingw-lp64 (or -msvcrt-lp64+win)
  • armeb-linux-gnueabi -> armeb-linux-gnu-eabi+sf
  • armeb-linux-gnueabihf -> armeb-linux-gnu-eabi
  • armeb-linux-musleabi -> armeb-linux-musl-eabi+sf
  • armeb-linux-musleabihf -> armeb-linux-musl-eabi
  • arm-linux-gnueabi -> arm-linux-gnu-eabi+sf
  • arm-linux-gnueabihf -> arm-linux-gnu-eabi
  • arm-linux-musleabi -> arm-linux-musl-eabi+sf
  • arm-linux-musleabihf -> arm-linux-musl-eabi
  • arm-windows-gnu -> arm-windows-mingw-eabi (or -msvcrt-eabi+win)
  • csky-linux-gnueabi -> csky-linux-gnu-abiv2+sf
  • csky-linux-gnueabihf -> csky-linux-gnu-abiv2
  • loongarch64-linux-gnu -> loongarch64-linux-gnu-lp64d
  • loongarch64-linux-musl -> loongarch64-linux-musl-lp64d
  • m68k-linux-gnu -> m68k-linux-gnu-gnu (not to be confused with -sysv which is older and different!)
  • m68k-linux-musl -> m68k-linux-musl-gnu (likewise)
  • mips64el-linux-gnuabi64 -> mips64el-linux-gnu-n64
  • mips64el-linux-gnuabin32 -> mips64el-linux-gnu-n32
  • mips64el-linux-musl -> mips64el-linux-musl-n64
  • mips64-linux-gnuabi64 -> mips64-linux-gnu-n64
  • mips64-linux-gnuabin32 -> mips64-linux-gnu-n32
  • mips64-linux-musl -> mips64-linux-musl-n64
  • mipsel-linux-gnueabi -> mipsel-linux-gnu-o32+sf
  • mipsel-linux-gnueabihf -> mipsel-linux-gnu-o32
  • mipsel-linux-musl -> mipsel-linux-musl-o32
  • mips-linux-gnueabi -> mips-linux-gnu-o32+sf (see my notes on EABI above)
  • mips-linux-gnueabihf -> mips-linux-gnu-o32 (likewise)
  • mips-linux-musl -> mips-linux-musl-o32
  • powerpc64le-linux-gnu -> powerpc64le-linux-gnu-elfv2
  • powerpc64le-linux-musl -> powerpc64le-linux-musl-elfv2+ldbl64
  • powerpc64-linux-gnu -> powerpc64-linux-gnu-elfv1
  • powerpc64-linux-musl -> powerpc64-linux-musl-elfv2+ldbl64
  • powerpc-linux-gnueabi -> powerpc-linux-gnu-svr4+sf (see my notes on EABI above)
  • powerpc-linux-gnueabihf -> powerpc-linux-gnu-svr4 (likewise)
  • powerpc-linux-musl -> powerpc-linux-musl-svr4+ldbl64+secplt
  • riscv32-linux-gnuilp32 -> riscv32-linux-gnu-ilp32d (yes, this one was confusingly named)
  • riscv32-linux-musl -> riscv32-linux-musl-ilp32d
  • riscv64-linux-gnu -> riscv64-linux-gnu-lp64d
  • riscv64-linux-musl -> riscv64-linux-musl-lp64d
  • s390x-linux-gnu -> s390x-linux-gnu-elf
  • s390x-linux-musl -> s390x-linux-musl-elf
  • sparc64-linux-gnu -> sparc64-linux-gnu-sysv
  • sparc-linux-gnu -> sparc-linux-gnu-sysv
  • thumb-linux-gnueabi -> thumb-linux-gnu-eabi+sf
  • thumb-linux-gnueabihf -> thumb-linux-gnu-eabi
  • thumb-linux-musleabi -> thumb-linux-musl-eabi+sf
  • thumb-linux-musleabihf -> thumb-linux-musl-eabi
  • wasm32-freestanding-musl -> wasm32-freestanding-none-watc (WebAssembly Tool Conventions)
  • wasm32-wasi-musl -> wasm32-wasi-wasi-watc (likewise)
  • x86_64-linux-gnu -> x86_64-linux-gnu-sysv
  • x86_64-linux-gnux32 -> x86_64-linux-gnu-x32
  • x86_64-linux-musl -> x86_64-linux-musl-sysv
  • x86_64-macos-none -> x86_64-macos-system-sysv
  • x86_64-windows-gnu -> x86_64-windows-mingw-sysv (or -msvcrt-win64)
  • x86-linux-gnu -> x86-linux-gnu-sysv
  • x86-linux-musl -> x86-linux-musl-sysv
  • x86-windows-gnu -> x86-windows-mingw-sysv (or -msvcrt-win32)

Note that many triples are still missing here; this is just intended to give a rough idea of how things will look. Also, some of these names would certainly be subject to change and/or bikeshedding during implementation.

Anticipated Concerns

I just want to address upfront some (reasonable!) concerns that I'm almost certain will be on people's minds after reading this:

  • libc and ABI choice are linked: This is true, as any given libc only supports certain ABIs. But note that I could make the exact same argument for architecture and OS, OS and combined libc + ABI, etc. This argument would be flawed for (at least) the same reason it is here: Combinatorial explosion. Additionally, note that some architecture/OS combinations also impose restrictions on the ABI. The reality is that every component of a target triple imposes semantic restrictions on the other components, even prior to this proposal. Finally, some ABI options actually are completely independent of libc.
  • This is more complicated than before: As I've hopefully demonstrated, this is only so because the status quo is unscalable and, in a case like long double on PowerPC, downright unworkable. It's easy to keep things simple if you don't account for all cases. (Quoting zig zen: Edge cases matter.) Also, writing code for real hardware is necessarily more complex than e.g. a bytecode VM. I think it's actually entirely reasonable to ask that people understand the basics of their target environment, especially in cross-compilation scenarios. The behavior of picking sensible ABI defaults based on other components should help here. For native builds, things stay simple.
  • Deviating from GNU triples this much will add a learning curve: I simply think the benefits outweigh the costs here. Andrew has stated before that Zig wants to be able to target a much wider variety of platforms than, say, LLVM - including old ones. I think this proposal (or something equivalent, at least) is a clear prerequisite for that goal. Also, just because GNU triples have become ubiquitous, it does not follow that they are good. They are in fact not good for a long list of reasons. (Quoting zig zen again: Avoid local maximums.)

Metadata

Metadata

Assignees

Labels

acceptedThis proposal is planned.breakingImplementing this issue could cause existing code to no longer compile or have different behavior.enhancementSolving this issue will likely involve adding new logic or components to the codebase.proposalThis issue suggests modifications. If it also has the "accepted" label then it is planned.standard libraryThis issue involves writing Zig code for the standard library.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions