Description
Currently we have this situation:
- stage1: Inline assembly is a comptime-known string that can be built with expressions such as
++
. - stage2: Inline assembly must be string literals. This is in preparation for this proposal, and here it is.
Here's one example of what inline assembly looks like today, for x86_64:
argc_argv_ptr = asm volatile (
\\ xor %%rbp, %%rbp
: [argc] "={rsp}" (-> [*]usize),
);
This proposal is to introduce the concept of dialects. As a first pass, the set of dialects would be exactly the std.Target.Cpu.Arch
enum. But it's likely that some dialects would be shared by multiple architectures. For example, x86 and x86_64 would probably share the x86
dialect. So we will have a separate enum for dialects.
A dialect is specified as an identifier token (it must be an identifier) directly after the asm
keyword, before the volatile
keyword if any, and it tells how to parse the assembly syntax:
const argc_argv_ptr: [*]usize = asm x86 volatile {
xor rbp, rbp // zig-style comments for all dialects
break rsp // we can make up our own syntax too for integration with zig language
};
I made some other changes here for fun but that's outside the scope of this proposal; this proposal is pointing out that we change the (
)
to braces and inside there is not a string literal but syntax that is more closely integrated with the zig language.
The tokenizer is shared between Zig syntax and all dialects. One tokenizer to rule them all.
The dialect tells the parser how to parse what is inside the braces. You can imagine how x86 is parsed in a drastically different manner than WebAssembly or SPIR-V.
Rather than the burden of parsing inline assembly falling on the backend, it falls on the frontend, where it is properly cached and it is easier to report errors. This also provides a way to unify inline assembly across multiple backends; for example right now we send inline assembly straight to LLVM with the LLVM backend, but we have our own bespoke parser in the x86_64 backend. This is a design flaw because we need to have consistent inline assembly syntax between the two backends; we need to parse it in a prior phase of the pipeline and then lower it to x86_64 MIR, or LLVM inline assembly.