Skip to content

Support for setjmp / longjmp #2625

Open

Description

Motivation

rust-lang/libc#1216 proposes adding setjmp and longjmp to libc. These functions are required to interface with some C libraries, but sadly these functions are currently impossible to use correctly. Users deserve a solution that works and warns or prevents common pitfalls.

Issues with current solution

The first issue is that libc cannot add LLVM returns_twice attribute to setjmp:

This attribute indicates that this function can return twice. The C setjmp is an example of such a function. The compiler disables some optimizations (like tail calls) in the caller of these functions.

Basically, LLVM assumes that setjmp can return only once without this attribute, potentially leading to miscompilations (playground):

unsafe fn foo() -> i32 {
    let mut buf: jmp_buf = [0; 8];
    let mut x = 42;
    if setjmp(&mut buf) != 0 {  // Step 0: setjmp returns 0
        // Step 3: when setjmp returns 1 x has always been
        // modified to be  == 13 so this should always return 13:
        return x;
    }
    x = 13; // Step 1: x is modified
    longjmp(&mut buf, 1); // Step 2: jumps to Step 0 returning 1
    x // this will never be reached
}

// In debug builds foo returns 13 correctly, but 
// in release builds foo returns 42
assert_eq!(unsafe { foo() }, 13); // FAILs in release

Because setjmp is not returns_twice, LLVM assumes that the return x; will only be reached before x = 13, so it will always return 42. However, setjmp returns 0 the first time, and returns 1 when jumped into it from the longjmp.

Using a volatile load instead works around this issue (e.g. playground). Basically, if stack variables are modified after a setjmp, all reads and writes until all possible longjmps will probably need to be volatile.

One way to improve this could be to add an stable attribute #[returns_twice] that users can use to mark that extern "C" { ... } functions can return multiple times and for Rust to handle these functions specially (e.g. by emitting the corresponding LLVM attribute). An alternative would be for Rust to provide these functions as part of the language.


Modulo LLVM-level misoptimizations, C only allows setjmp in specific "contexts" [0], and these features interact badly with languages with destructors. Rust does not guarantee that destructors will run (e.g. mem::forget is safe), so skipping destructors using longjmp is not unsound [1]. However, unsafe code will need to take into account that code outside it can use longjmp to skip destructors when creating safe abstractions (e.g. see Observational equivalency and unsafe code).

More worrying is how longjmp subverts the borrow checker to, e.g., produce undefined behavior of the form use-after-move without triggering a type error (playground):

fn bar(_a: A) { println!("a moved") }
fn foo() {
    let mut buf: jmp_buf = [0; 8];
    let a = A;
    if unsafe { setjmp(&mut buf) } != 0 {  // Step 0: setjmp returns 0
        bar(a);  // Step 3: a is moved _again_ (UB: use-after-move)
        return;
    }
    bar(a); // Step 1: a is moved here
    unsafe { longjmp(&mut buf, 1) };  // Step 2: jumps to Step 0 returning 1
}

This prints "a moved" twice, which means that the variable a was moved from twice, so the second time an use-after-move happened which type-checked. Obviously, longjmp is not the only way to achieve this in Rust, e.g. it is trivial to use the pointer methods to do so as well, but longjmp combined with Drop types makes this happen with no effort (playground):

struct A; impl Drop for A { ... }
fn bar(_a: A) { 
   // _a is dropped here
}
fn foo() {
    let mut buf: jmp_buf = [0; 8];
    let a = A;
    if unsafe { setjmp(&mut buf) } != 0 {
        // use-after-move: a has been moved (and dropped) below 
        // but a is used (and dropped) in this branch again 
        // => double-drop => UB 
        return; 
    }
    bar(a); // moves a 
    unsafe { longjmp(&mut buf, 1) };
}

That is, using setjmp+longjmp to create double-drops (undefined behavior) is trivial.

Finally, there are problems with creating wrappers around these functions (playground):

fn foo(buf: &mut jmp_buf) {
    let mut a: i32 = 42;
    if unsafe { setjmp(buf) } != 0 {
        dbg!(a);  // use-after-free
        panic!("done");
    }
    a = 13;
}

fn main() {
    let mut buf: jmp_buf = [0; 8];
    foo(&mut buf);
    let b: i32 = 666;
    dbg!(b);
    unsafe { longjmp(&mut buf, 1); }
}

Prints b = 666 and a = 0. The problem here is that this code saves the stack pointer inside foo, but then foo returns, and afterwards the longjmp jumps to a stack frame that is no longer live, so dbg!(a) reads a after it has been free'd (use-after-free).

There are probably many other problems with these two functions, that does not mean that they are impossible to use correctly. Still, it would be good to have a solution here that at least warns about potentially incorrect usages since reasoning about these and the surrounding unsafe code is often very tricky.

It would also be good to have a way to soundly model these in miri and detect when they are used incorrectly.

At a minimum we should be able to write down documentation for these functions in Rust. Where exactly can they be used, what does the unsafe code surrounding them need to uphold to be correct, etc.


cc @nikomatsakis (wrote blog post about observational equivalence and unsafe code), @rkruppe , @RalfJung @ubsan - the unsafe code guidelines should probably say whether extern "C" functions are allowed to modify the stack pointer etc. like setjmp/longjmp do.


An invocation of the setjmp macro shall appear only in one of the following contexts:

  • the entire controlling expression of a selection or iteration statement;
  • one operand of a relational or equality operator with the other operand an integer constant expression, with the resulting expression being the entire controlling expression of a selection or iteration statement;
  • the operand of a unary ! operator with the resulting expression being the entire controlling expression of a selection or iteration statement; or
  • the entire expression of an expression statement (possibly cast to void).

While C11 7.13.1.1p5 states:

If the invocation appears in any other context, the behavior is undefined.

That is, let x = setjmp(...); would be UB in C. In Rust having a result of a function be usable only in some contexts would be weird.

  • [1] In C++, skipping destructors with longjmp is undefined behavior, e.g., [support.runtime] states:

If any automatic objects would be destroyed by a thrown exception transferring control to another (destination) point in the program, then a call to longjmp(jbuf, val) at the throw point that transfers control to the same (destination) point has undefined behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    A-ffiFFI related proposals.A-machineProposals relating to Rust's abstract machine.T-langRelevant to the language team, which will review and decide on the RFC.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions