Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several questions about using Cranelift as a backend for a programming language #5141

Closed
yorickpeterse opened this issue Oct 27, 2022 · 11 comments

Comments

@yorickpeterse
Copy link

I'm looking into using Cranelift as a backend for a programming language that I'm working on. Other options I have been looking into are compiling to C or using LLVM, but both come with their own set of trade-offs I'm not a fan of. For example, LLVM is generally quite slow, while C compilers tend to not give you enough control (i.e. they may reorder things unless you carefully insert compiler barriers). Cranelift is promising because of the amount of control it gives, and because it's written in Rust.

Unfortunately, the Cranelift documentation is limited, to a point where I'm overwhelmed/confused about where to even start. In this issue I'll be asking several questions to hopefully clear this up. If there's a better place for asking long-form questions like this, please let me know 😃

To set the stage: Inko is a language similar to Go/Erlang in that it has lightweight processes, and these can suspend at more or less any place (technically these spots are known, e.g. when performing IO, but to the user it may as well be "anywhere"). This means I need to be able to maintain and swap the stacks, like stackful coroutines. I also need to be able to grow the stack, and maybe shrink it (though figuring out when to do that is really tricky). Bare-metal performance is not a main goal, rather I want something that compiles fast and performs better than a well optimised interpreter (without a JIT), something I think one should be able to achieve with Cranelift.

Which brings me to the questions I have, in no particular order:

  1. Cranelift offers no way of injecting assembly directly into the IR. Instead it seems you have to define a C/assembly function then bind to that. Is Cranelift able to jump to a naked function defined in C and exposed using #[no_mangle] and extern "C"? The idea here is to make these stack fiddling routines naked functions so they don't introduce extra assembly overhead, and to give you full control. By jumping to them directly they essentially act as assembly macros, avoiding the function call overhead.

  2. Defining function names in Cranelift is a bit confusing. UserFuncName basically takes two arbitrary u32 values, and it's not clear what you'd use those for. Then there's Module.declare_function which takes a &str, and looking at some existing Cranelift projects this seems to be what they use. So which one should I use if I want to expose my function names (potentially mangled)?

  3. EBBs take variables as arguments, but it's not clear how you'd determine what variables to pass down to sub blocks. For example, if I have a graph where blocks A and B both jump to C, and C depends on a result produced by these blocks, how would I know what variable to pass? My own IR is already a graph but not in SSA, thus blocks can just use variables defined in some earlier block. My graph does ensure that in the above case the blocks A and B write to any variable needed by C where its value can differ per branch.

  4. How would one associate debugger information with their generated code? For example, I would like for a debugger to know about my source code, line numbers, etc. The Cranelift API doesn't seem to offer anything to do things like "map these instructions to lines X, Y, Z in the source code". Do I have to fiddle with the likes of e.g. gimli (and whatever the equivalent is for Windows)?

  5. Does Cranelift have any routines for detecting integer overflows? Instructions such as iadd seem to wrap around on overflow, so detecting it manually is doable if needed; I just don't want to reinvent this if already provided by Cranelift.

  6. Cranelift seems to have this notion of heaps for functions, and the ability to restrict functions to a certain portion of the heap. For my language there are no such restrictions (i.e. processes don't use fixed size heaps. How would I express this in Cranelift?

  7. Related to that, I'm seeing various hints about Cranelift performing heap bounds checking in various places. It's not clear to me if this would be needed in my case, and if not if there's a way of opting out (i.e. I don't want redundant heap bounds checking in my generated code). Is there more information on this somewhere?

  8. Related to swapping the stacks: if I insert an assembly routine in a function prologue (i.e. the function just starts with a call/jump to such a routine), is there a way to guarantee Cranelift won't move/inject anything before the call/jump? The assembly that might be needed for the system's calling convention should be fine as I can just add some padding to my stacks to account for that, but I don't want Cranelift to think my routine has no side effects and what not and just move it elsewhere.

Thanks for answering any of these questions, and again let me know if it's better to post this elsewhere! 😃

@bjorn3
Copy link
Contributor

bjorn3 commented Oct 27, 2022

  1. Cranelift offers no way of injecting assembly directly into the IR. Instead it seems you have to define a C/assembly function then bind to that. Is Cranelift able to jump to a naked function defined in C and exposed using #[no_mangle] and extern "C"? The idea here is to make these stack fiddling routines naked functions so they don't introduce extra assembly overhead, and to give you full control. By jumping to them directly they essentially act as assembly macros, avoiding the function call overhead.

Yes, you can jump to functions defined in C.

  1. Defining function names in Cranelift is a bit confusing. UserFuncName basically takes two arbitrary u32 values, and it's not clear what you'd use those for. Then there's Module.declare_function which takes a &str, and looking at some existing Cranelift projects this seems to be what they use. So which one should I use if I want to expose my function names (potentially mangled)?

You should probably be using Module::declare_function. UserFuncName is for when you are avoiding cranelift-jit and cranelift-object, but instead write your own code to do whatever you want with individual functions compiled by Cranelift.

  1. EBBs take variables as arguments, but it's not clear how you'd determine what variables to pass down to sub blocks. For example, if I have a graph where blocks A and B both jump to C, and C depends on a result produced by these blocks, how would I know what variable to pass? My own IR is already a graph but not in SSA, thus blocks can just use variables defined in some earlier block. My graph does ensure that in the above case the blocks A and B write to any variable needed by C where its value can differ per branch.

You can use the cranelift-frontend crate to generate SSA form on the fly.

  1. How would one associate debugger information with their generated code? For example, I would like for a debugger to know about my source code, line numbers, etc. The Cranelift API doesn't seem to offer anything to do things like "map these instructions to lines X, Y, Z in the source code". Do I have to fiddle with the likes of e.g. gimli (and whatever the equivalent is for Windows)?

For line info if you use cranelift-frontend you can use set_srcloc with a unique value for every source location. You can then map the SourceLoc you passed in back to the source locations after compilation and use gimli to produce the line tables. See for example https://github.com/bjorn3/rustc_codegen_cranelift/tree/master/src/debuginfo. As for getting debuginfo for local variables there is technically support for it, but not very good.

  1. Does Cranelift have any routines for detecting integer overflows? Instructions such as iadd seem to wrap around on overflow, so detecting it manually is doable if needed; I just don't want to reinvent this if already provided by Cranelift.

You probably have to detect it manually for now. There are instructions like iadd_cout, but they are not supported for all integer sizes on all backends. I would like to see it implemented universally myself.

  1. Cranelift seems to have this notion of heaps for functions, and the ability to restrict functions to a certain portion of the heap. For my language there are no such restrictions (i.e. processes don't use fixed size heaps. How would I express this in Cranelift?

These heaps are mostly for use in the translation from wasm to clif ir. You don't have to use them. The load and store instructions can access the entire address space just fine.

  1. Related to swapping the stacks: if I insert an assembly routine in a function prologue (i.e. the function just starts with a call/jump to such a routine), is there a way to guarantee Cranelift won't move/inject anything before the call/jump? The assembly that might be needed for the system's calling convention should be fine as I can just add some padding to my stacks to account for that, but I don't want Cranelift to think my routine has no side effects and what not and just move it elsewhere.

No, Cranelift will always put a prologue first before doing anything. One option I guess would be to insert raw bytes representing the call before the function. You can use context.compile(), get the compiled code and relocations, insert your custom prologue and then call module.define_function_bytes instead of module.define_function.

and again let me know if it's better to post this elsewhere! 😃

https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift might be a better place.

@cfallin
Copy link
Member

cfallin commented Oct 27, 2022

Hi @yorickpeterse -- thanks for the questions and interest! Apologies for the state of our documentation: we need to do an update pass, and probably develop more tutorial-like material, but we haven't had the time to do that. In any case, we're happy to answer questions here (or on our Zulip in the Cranelift stream), so please keep them coming if you have any more.

(On preview, I see that bjorn3 has answered a bunch so I'll just add to some of my thoughts here)

  1. Cranelift offers no way of injecting assembly directly into the IR. Instead it seems you have to define a C/assembly function then bind to that. Is Cranelift able to jump to a naked function defined in C and exposed using #[no_mangle] and extern "C"? The idea here is to make these stack fiddling routines naked functions so they don't introduce extra assembly overhead, and to give you full control. By jumping to them directly they essentially act as assembly macros, avoiding the function call overhead.

We currently don't have a way of injecting assembly at all -- #1041 goes over some of the reasons why. Basically, while it seems conceptually simple ("just emit the assembly I give you!"), it actually is enormously complex for all the reasons listed there (see especially this comment).

When you say "jump to a function", not directly; we can call one though, which is actually probably what you'd want. (Otherwise, how does this snippet know where to jump back to?) Perhaps you were thinking the jump would somehow inline the snippet (we don't have this)?

Popping up a level (in the X-Y problem sense): the goal here is stack-switching and/or segmented stacks, goroutine-style, no? If so, I'll say that we've had good luck in Wasmtime (built on top of Cranelift) doing fiber-style stack switching for async function calls, and that didn't need anything special in Cranelift. (At a yield point in the runtime when called from Cranelift-generated code, call the "switch back to host stack" helper; it pushes registers, switches SP, and returns on the other stack.) We also have explicit support in Cranelft for stack bounds checks. Combining the two, I could see a small modification to the Cranelift prologue generator that would say "if limit reached, call this runtime function" and that function would essentially allocate a new stack, switch to it, place a little trampoline on the stack to return back to the previous segment on return, and then jump back to the return address in the function that hit the limit. I'd be happy to help design this feature if you're interested.

  1. Does Cranelift have any routines for detecting integer overflows? Instructions such as iadd seem to wrap around on overflow, so detecting it manually is doable if needed; I just don't want to reinvent this if already provided by Cranelift.

In general, no; we just added some opcodes to help us with lowering Wasm heap checks, but for maximal flexibility I'd suggest inserting checks with explicit IR. We have a set of trap instructions you can use for this: icmp to compare, trapnz on the result.

  1. Related to swapping the stacks: if I insert an assembly routine in a function prologue (i.e. the function just starts with a call/jump to such a routine), is there a way to guarantee Cranelift won't move/inject anything before the call/jump? The assembly that might be needed for the system's calling convention should be fine as I can just add some padding to my stacks to account for that, but I don't want Cranelift to think my routine has no side effects and what not and just move it elsewhere.

This gets back to the inline-assembly question (see above); prologues (and epilogues) in particular are even more special because the compiler generates them to set up its stack frame in a careful way, so even if you could insert instructions somehow, we'd prefer that not to be the case (it would create a brittle coupling between our stackframe and prologue design and your inserted code). We actually had a version of this problem earlier with Cranelift's embedding in SpiderMonkey -- we had a mode to generate code without pro/epilogues and the embedder built them and glued things together, but it was not maintainable.

In general I like the philosophy of adding useful primitives/building blocks so that they come into the world of the compiler IR/settings and we can understand the interactions with the rest of our codegen. So, as above, I think we can design something for the "new stack segment if hit limit" case.

@yorickpeterse
Copy link
Author

@bjorn3 Thanks for the answers so far!

Yes, you can jump to functions defined in C.

In the documentation of cranelift_codegen the closest I could find is https://docs.rs/cranelift-codegen/latest/cranelift_codegen/ir/trait.InstBuilder.html#method.jump, but this jumps to a block and not a random memory address. Should I perhaps look in a different crate for this?

You can use the cranelift-frontend crate to generate SSA form on the fly.

Looking at the example on https://docs.rs/cranelift-frontend/latest/cranelift_frontend/#example, it seems to use jumps but leaves out the arguments. Am I correct in thinking that if you use use_var() in a sub block, Cranelift automatically passes its input as an argument when jumping to said block? That is, if you have a graph like this:

A    assign a to 42
|
+--------->----------> C use(a)
|
B    assign a to 50

Does Cranelift construct the graph such that it passes a in blocks A and B as an argument to block C, such that a in C is whatever value was produced in the branch leading up to it? Or do I have to explicitly set a in A/B as an argument for the jump(s) to C?

No, Cranelift will always put a prologue first before doing anything. One option I guess would be to insert raw bytes representing the call before the function. You can use context.compile(), get the compiled code and relocations, insert your custom prologue and then call module.define_function_bytes instead of module.define_function.

I suspect this would probably get a bit tricky, and maybe break across versions. If I just insert a function call as the first instruction after the prologue, is there anything I can do to prevent Cranelift from moving it around, i.e. is there something like a compiler barrier? If so I think that should be good enough.

https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift might be a better place.

I thought about the Zulip, but it's too easy to lose track of things in a chat channel unfortunately. But if desired I can ask things over there instead 😃

@bjorn3
Copy link
Contributor

bjorn3 commented Oct 27, 2022

Looking at the example on https://docs.rs/cranelift-frontend/latest/cranelift_frontend/#example, it seems to use jumps but leaves out the arguments. Am I correct in thinking that if you use use_var() in a sub block, Cranelift automatically passes its input as an argument when jumping to said block?

Correct. Cranelift-frontend automatically appends basic block arguments as necessary to handle SSA form.

@yorickpeterse
Copy link
Author

@cfallin

We currently don't have a way of injecting assembly at all -- #1041 goes over some of the reasons why. Basically, while it seems conceptually simple ("just emit the assembly I give you!"), it actually is enormously complex for all the reasons listed there (see especially this comment).

I had indeed seen the issue, and I figured it wouldn't be as easy as just dumping assembly in the middle of the IR 😄

When you say "jump to a function", not directly; we can call one though, which is actually probably what you'd want. (Otherwise, how does this snippet know where to jump back to?) Perhaps you were thinking the jump would somehow inline the snippet (we don't have this)?

Ah yeah good point, I forgot about actually having to jump back; heh.

Either way, the function call cost isn't really something to worry about so calling a helper function should be fine.

Popping up a level (in the X-Y problem sense): the goal here is stack-switching and/or segmented stacks, goroutine-style, no?

Pretty much. I was planning on taking the approach outlined in https://graphitemaster.github.io/fibers/ where I'd just define a bunch of routines in assembly (instead of using setjmp/longjmp), then call those in the appropriate places.

Combining the two, I could see a small modification to the Cranelift prologue generator that would say "if limit reached, call this runtime function" and that function would essentially allocate a new stack, switch to it, place a little trampoline on the stack to return back to the previous segment on return, and then jump back to the return address in the function that hit the limit. I'd be happy to help design this feature if you're interested.

I can see this being useful, though I currently don't know enough about Cranelift or how exactly I would implement the routine in a cross-platform way. I think it would probably be best to first implement this in my language, then maybe come back to Cranelift to see how/if it makes sense for it to provide a generalised solution.

@bjorn3
Copy link
Contributor

bjorn3 commented Oct 27, 2022

For segmented stacks following the same abi as https://gcc.gnu.org/wiki/SplitStacks may work. LLVM also implements this and Rust itself used to use it back when it still used green threads.

@yorickpeterse
Copy link
Author

@bjorn3 @cfallin Thanks for the info! I'll move this to Zulip and close the issue 😃

@jameysharp
Copy link
Contributor

As other folks have said, welcome! Although Cranelift is co-developed with Wasmtime which gives Wasmtime a little bit of a privileged position, we explicitly want to support projects like yours. So we're excited to hear from you!

I especially want to emphasize that we're open to making changes to Cranelift to support new use cases. We have to balance that against making sure we have the resources for ongoing maintenance of every change we merge. But if you find that it's impossible to do something you need using Cranelift, just keep in mind that fixing Cranelift is an option.

Looking at the example on https://docs.rs/cranelift-frontend/latest/cranelift_frontend/#example, it seems to use jumps but leaves out the arguments. Am I correct in thinking that if you use use_var() in a sub block, Cranelift automatically passes its input as an argument when jumping to said block?

Correct. Cranelift-frontend automatically appends basic block arguments as necessary to handle SSA form.

To clarify, "as necessary" is doing double-duty here. Cranelift allows you to use a value in one block that was defined in another block, as long as the definition dominates the use. So cranelift-frontend will avoid adding a block parameter if there's exactly one definition of the variable that reaches the use. You shouldn't have to think about that as long as you use cranelift-frontend but it's good to be aware of if you're reading the generated Cranelift IR.

If I just insert a function call as the first instruction after the prologue, is there anything I can do to prevent Cranelift from moving it around[...]?

I believe, and @cfallin can correct me if I'm wrong, that currently 1) Cranelift won't move side-effecting instructions, and 2) it treats all function calls as side-effecting. It may move pure instructions past effectful ones, but I think it'll never move a pure instruction any earlier in the function. So I think the current state is that if a function call is the first instruction, it will still be the first instruction after all optimizations, and only the prologue will precede it in the final machine code.

Whether you can rely on that behavior in the future is another question, of course.

I don't entirely understand what you're trying to do, but I'm curious whether the get_stack_pointer instruction added in #4573 helps your use case at all, and we can always talk about adding more instructions if we have a good motivating use case for them. Since there's ongoing work in the WebAssembly standards world on proposals for stack-switching, it could even be that Wasmtime is going to need something similar to whatever you're looking for. If there are two projects both looking for the same features from Cranelift, then that's especially strong motivation to come up with a good design.

@yorickpeterse
Copy link
Author

@jameysharp Thanks for the reply!

I don't entirely understand what you're trying to do, but I'm curious whether the get_stack_pointer instruction added in #4573 helps your use case at all, and we can always talk about adding more instructions if we have a good motivating use case for them.

Essentially I'd need routines for the following:

  1. Allocating memory that can be used as stack memory. This is out of the scope of Cranelift I think, and OS/calling convention specific (i.e. Windows of course does things its own way compared to everybody else). This I'd implement in my Rust based runtime library, using the appropriate OS functions.
  2. A routine that takes a pointer to such a chunk of memory, and turns it into the current OS thread's stack memory. I think this involves mostly changing the stack pointer to point to this chunk of memory.
  3. A routine for writing all registers to a buffer, using the steps described in https://graphitemaster.github.io/fibers/#getting-the-context. For this I'd use an assembly function, or maybe a naked Rust function with inline assembly; I haven't decided yet. This being provided in a reusable manner is something I could see being useful, but I'm not sure it would fit within the scope of Cranelift.
  4. A routine that takes the current stack pointer and stores it somewhere, so that we can restore this stack later on.

This is basically what https://graphitemaster.github.io/fibers/#user-space-context-switching discusses, but in Rust/Cranelift instead of C.

@yorickpeterse
Copy link
Author

A routine for writing all registers to a buffer, using the steps described in https://graphitemaster.github.io/fibers/#getting-the-context.

On second though, if Cranelift were to provide this it would be able to only save the registers actually in use at that point, rather that pessimistically having to assume any of them may be used, and thus having to save all of them.

The inverse would also be true: if we have known wake-up points, Cranelift would know which registers are in use afterwards, and only restore those. This may be less useful though, as based on the register allocator and the code that follows it might just be easier to restore all of them.

@bjorn3
Copy link
Contributor

bjorn3 commented Oct 27, 2022

On second though, if Cranelift were to provide this it would be able to only save the registers actually in use at that point, rather that pessimistically having to assume any of them may be used, and thus having to save all of them.

That linked code only saves callee saved registers which have to be saved anyway for ABI reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants