Description
openedon Jan 20, 2019
Motivation
rust-lang/libc#1216 proposes adding setjmp
and longjmp
to libc
. These functions are required to interface with some C libraries, but sadly these functions are currently impossible to use correctly. Users deserve a solution that works and warns or prevents common pitfalls.
Issues with current solution
The first issue is that libc
cannot add LLVM returns_twice
attribute to setjmp
:
This attribute indicates that this function can return twice. The C
setjmp
is an example of such a function. The compiler disables some optimizations (like tail calls) in the caller of these functions.
Basically, LLVM assumes that setjmp
can return only once without this attribute, potentially leading to miscompilations (playground):
unsafe fn foo() -> i32 {
let mut buf: jmp_buf = [0; 8];
let mut x = 42;
if setjmp(&mut buf) != 0 { // Step 0: setjmp returns 0
// Step 3: when setjmp returns 1 x has always been
// modified to be == 13 so this should always return 13:
return x;
}
x = 13; // Step 1: x is modified
longjmp(&mut buf, 1); // Step 2: jumps to Step 0 returning 1
x // this will never be reached
}
// In debug builds foo returns 13 correctly, but
// in release builds foo returns 42
assert_eq!(unsafe { foo() }, 13); // FAILs in release
Because setjmp
is not returns_twice
, LLVM assumes that the return x;
will only be reached before x = 13
, so it will always return 42
. However, setjmp
returns 0
the first time, and returns 1
when jumped into it from the longjmp
.
Using a volatile load instead works around this issue (e.g. playground). Basically, if stack variables are modified after a setjmp
, all reads and writes until all possible longjmp
s will probably need to be volatile.
One way to improve this could be to add an stable attribute #[returns_twice]
that users can use to mark that extern "C" { ... }
functions can return multiple times and for Rust to handle these functions specially (e.g. by emitting the corresponding LLVM attribute). An alternative would be for Rust to provide these functions as part of the language.
Modulo LLVM-level misoptimizations, C only allows setjmp
in specific "contexts" [0], and these features interact badly with languages with destructors. Rust does not guarantee that destructors will run (e.g. mem::forget
is safe), so skipping destructors using longjmp
is not unsound [1]. However, unsafe
code will need to take into account that code outside it can use longjmp
to skip destructors when creating safe abstractions (e.g. see Observational equivalency and unsafe code).
More worrying is how longjmp
subverts the borrow checker to, e.g., produce undefined behavior of the form use-after-move without triggering a type error (playground):
fn bar(_a: A) { println!("a moved") }
fn foo() {
let mut buf: jmp_buf = [0; 8];
let a = A;
if unsafe { setjmp(&mut buf) } != 0 { // Step 0: setjmp returns 0
bar(a); // Step 3: a is moved _again_ (UB: use-after-move)
return;
}
bar(a); // Step 1: a is moved here
unsafe { longjmp(&mut buf, 1) }; // Step 2: jumps to Step 0 returning 1
}
This prints "a moved" twice, which means that the variable a
was moved from twice, so the second time an use-after-move happened which type-checked. Obviously, longjmp
is not the only way to achieve this in Rust, e.g. it is trivial to use the pointer methods to do so as well, but longjmp
combined with Drop
types makes this happen with no effort (playground):
struct A; impl Drop for A { ... }
fn bar(_a: A) {
// _a is dropped here
}
fn foo() {
let mut buf: jmp_buf = [0; 8];
let a = A;
if unsafe { setjmp(&mut buf) } != 0 {
// use-after-move: a has been moved (and dropped) below
// but a is used (and dropped) in this branch again
// => double-drop => UB
return;
}
bar(a); // moves a
unsafe { longjmp(&mut buf, 1) };
}
That is, using setjmp+longjmp
to create double-drops (undefined behavior) is trivial.
Finally, there are problems with creating wrappers around these functions (playground):
fn foo(buf: &mut jmp_buf) {
let mut a: i32 = 42;
if unsafe { setjmp(buf) } != 0 {
dbg!(a); // use-after-free
panic!("done");
}
a = 13;
}
fn main() {
let mut buf: jmp_buf = [0; 8];
foo(&mut buf);
let b: i32 = 666;
dbg!(b);
unsafe { longjmp(&mut buf, 1); }
}
Prints b = 666
and a = 0
. The problem here is that this code saves the stack pointer inside foo
, but then foo
returns, and afterwards the longjmp
jumps to a stack frame that is no longer live, so dbg!(a)
reads a
after it has been free'd (use-after-free).
There are probably many other problems with these two functions, that does not mean that they are impossible to use correctly. Still, it would be good to have a solution here that at least warns about potentially incorrect usages since reasoning about these and the surrounding unsafe code is often very tricky.
It would also be good to have a way to soundly model these in miri and detect when they are used incorrectly.
At a minimum we should be able to write down documentation for these functions in Rust. Where exactly can they be used, what does the unsafe code surrounding them need to uphold to be correct, etc.
cc @nikomatsakis (wrote blog post about observational equivalence and unsafe code), @rkruppe , @RalfJung @ubsan - the unsafe code guidelines should probably say whether extern "C" functions are allowed to modify the stack pointer etc. like setjmp/longjmp do.
- [0] C11 7.13.1.1p4 states the following about
setjmp
:
An invocation of the setjmp macro shall appear only in one of the following contexts:
- the entire controlling expression of a selection or iteration statement;
- one operand of a relational or equality operator with the other operand an integer constant expression, with the resulting expression being the entire controlling expression of a selection or iteration statement;
- the operand of a unary ! operator with the resulting expression being the entire controlling expression of a selection or iteration statement; or
- the entire expression of an expression statement (possibly cast to void).
While C11 7.13.1.1p5 states:
If the invocation appears in any other context, the behavior is undefined.
That is, let x = setjmp(...);
would be UB in C. In Rust having a result of a function be usable only in some contexts would be weird.
- [1] In C++, skipping destructors with
longjmp
is undefined behavior, e.g., [support.runtime] states:
If any automatic objects would be destroyed by a thrown exception transferring control to another (destination) point in the program, then a call to longjmp(jbuf, val) at the throw point that transfers control to the same (destination) point has undefined behavior.