Skip to content

parse inline assembly syntax according to a set of dialects; integrate inline assembly more closely with the zig language #10761

Open
@andrewrk

Description

@andrewrk

Currently we have this situation:

  • stage1: Inline assembly is a comptime-known string that can be built with expressions such as ++.
  • stage2: Inline assembly must be string literals. This is in preparation for this proposal, and here it is.

Here's one example of what inline assembly looks like today, for x86_64:

argc_argv_ptr = asm volatile (
    \\ xor %%rbp, %%rbp
    : [argc] "={rsp}" (-> [*]usize),
);

This proposal is to introduce the concept of dialects. As a first pass, the set of dialects would be exactly the std.Target.Cpu.Arch enum. But it's likely that some dialects would be shared by multiple architectures. For example, x86 and x86_64 would probably share the x86 dialect. So we will have a separate enum for dialects.

A dialect is specified as an identifier token (it must be an identifier) directly after the asm keyword, before the volatile keyword if any, and it tells how to parse the assembly syntax:

const argc_argv_ptr: [*]usize = asm x86 volatile {
    xor rbp, rbp  // zig-style comments for all dialects
    break rsp // we can make up our own syntax too for integration with zig language
};

I made some other changes here for fun but that's outside the scope of this proposal; this proposal is pointing out that we change the ( ) to braces and inside there is not a string literal but syntax that is more closely integrated with the zig language.

The tokenizer is shared between Zig syntax and all dialects. One tokenizer to rule them all.

The dialect tells the parser how to parse what is inside the braces. You can imagine how x86 is parsed in a drastically different manner than WebAssembly or SPIR-V.

Rather than the burden of parsing inline assembly falling on the backend, it falls on the frontend, where it is properly cached and it is easier to report errors. This also provides a way to unify inline assembly across multiple backends; for example right now we send inline assembly straight to LLVM with the LLVM backend, but we have our own bespoke parser in the x86_64 backend. This is a design flaw because we need to have consistent inline assembly syntax between the two backends; we need to parse it in a prior phase of the pipeline and then lower it to x86_64 MIR, or LLVM inline assembly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    breakingImplementing this issue could cause existing code to no longer compile or have different behavior.proposalThis issue suggests modifications. If it also has the "accepted" label then it is planned.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions