[eRFC] Include call graph information in LLVM IR #59412
Description
Summary
Add an experimental compiler feature / flag to add call graph information, in
the form of LLVM metadata, to the LLVM IR (.ll
) files produced by the
compiler.
Motivation
(This section ended up being a bit long winded. The TL;DR is improving existing
stack analysis usage tools.)
Stack usage analysis is a hard requirement for certifying safety critical
(embedded) applications. This analysis is usually implemented as a static
analysis tool that computes the worst case stack usage of an application. The
information provided by this tool is used in the certification process to
demonstrate that the application won't run into a stack overflow at runtime.
Several such tools exist for C/C++ programs, mainly in commercial
and closed source forms. A few months ago the Rust compiler gained a feature
that's a pre-requisite for implementing such tools in the Rust world: -Z emit-stack-sizes
. This flag makes stack usage information about every Rust
function available in the binary artifacts (object files) produced by the
compiler.
And just recently a tool that uses this flag and call graph analysis to perform
whole program stack usage analysis has been developed: cargo-call-stack
(full disclaimer: I'm the author of said tool). The tool does OK when dealing
with programs that only uses direct function calls but it's lacking (either
over-pessimistic or flat out incorrect) when analyzing programs that contain
indirect function calls, that is function pointer calls and/or dynamic
dispatch.
Call graph analysis in the presence of indirect function calls is notoriously
hard, but Rust strong typing makes the problem tractable -- in fact, dynamic
dispatch is easier to reason about than function pointer calls. However, this
last part only holds true when Rust type information is available to the tool,
which is not the case today.
To elaborate: it's' important that the call graph is extracted from
post-LLVM-optimization output as that greatly reduces the chance of inserting
invalid edges. For example, what appears to be a function call at the (Rust)
source level (e.g. let x = foo();
) may not actually exist in the final binary
due to inlining or dead code elimination. For this reason, Rust stack usage
analysis tools are limited to two sources of call graph information: the machine
code in the final executable and post-optimization LLVM IR (rustc
's
--emit=llvm-ir
). The former contains no type information and the latter
contains LLVM type information, not Rust type information.
cargo-call-stack
currently uses the type information available in the
LLVM IR of a crate to reason about indirect function calls. However, LLVM type
information is not as complete as Rust type information because the conversion
is lossy. Consider the following Rust source code and corresponding LLVM IR.
#[no_mangle] // shorter name
static F: AtomicPtr<fn() -> u32> = AtomicPtr::new(foo as *mut _);
fn main() {
if let Some(f) = unsafe { F.load(Ordering::Relaxed).as_ref() } {
f(); // function pointer call
}
// volatile load to preserve the return value of `bar`
unsafe {
core::ptr::read_volatile(&baz());
}
}
#[no_mangle]
fn foo() -> u32 {
F.store(bar as *mut _, Ordering::Relaxed);
0
}
#[no_mangle]
fn bar() -> u32 {
1
}
#[inline(never)]
#[no_mangle]
fn baz() -> i32 {
F.load(Ordering::Relaxed) as usize as i32
}
define void @main() unnamed_addr #3 !dbg !1240 {
start:
%_14 = alloca i32, align 4
%0 = load atomic i32, i32* bitcast (<{ i8*, [0 x i8] }>* @F to i32*) monotonic, align 4
%1 = icmp eq i32 %0, 0, !dbg !1251
br i1 %1, label %bb5, label %bb3, !dbg !1251
bb3: ; preds = %start
%2 = inttoptr i32 %0 to i32 ()**, !dbg !1252
%3 = load i32 ()*, i32 ()** %2, align 4, !dbg !1254, !nonnull !46
%4 = tail call i32 %3() #9, !dbg !1254
br label %bb5, !dbg !1255
; ..
}
; Function Attrs: norecurse nounwind
define internal i32 @foo() unnamed_addr #0 !dbg !1189 {
; ..
}
; Function Attrs: norecurse nounwind readnone
define internal i32 @bar() unnamed_addr #1 !dbg !1215 {
; ..
}
; Function Attrs: noinline norecurse nounwind
define internal fastcc i32 @baz() unnamed_addr #2 !dbg !1217 {
; ..
}
Note how in the LLVM IR output foo
, bar
and baz
all have the same function
signature: i32 ()
, which is the LLVM version of fn() -> i32
. There are no
unsigned integer types in LLVM IR so both Rust types, i32
and u32
, get
converted to i32
in the LLVM IR.
Line %4 = ..
in the LLVM IR is the function pointer call. This too,
incorrectly, indicates that a function pointer with signature i32 ()
(fn() -> i32
) is being invoked.
This lossy conversion leads cargo-call-stack
to incorrectly add an edge
between the node that represents the function pointer call and baz
. See below:
If the tool had access to call graph information from the compiler it would have
produced the following accurate call graph.
This eRFC proposes adding a feature to aid call graph and stack usage analysis.
(For a more in depth explanation of how cargo-call-stack
works please refer to
this blog post: https://blog.japaric.io/stack-analysis/)
Design
We propose that call graph information is added to the LLVM IR that rustc
produces in the form of LLVM metadata when the unstable -Z call-metadata
rustc
flag is used.
Function pointer calls
Functions that are converted into function pointers in Rust source (e.g. let x: fn() -> i32 = foo
) will get extra LLVM metadata in their definitions (IR:
define
). The metadata will have the form !{!"fn", !"fn() -> i32"}
, where the
second node represents the signature of the function. Likewise, function pointer
calls will get similar LLVM metadata at call site (IR: call
/ invoke
).
Revisiting the previous example, the IR would change as shown below:
define void @main() unnamed_addr #3 !dbg !1240 {
; ..
%4 = tail call i32 %3() #9, !dbg !1254, !rust !0
; .. ^^^^^^^^ (ADDED)
}
; Function Attrs: norecurse nounwind
define internal i32 @foo() unnamed_addr #0 !dbg !1189 !rust !0 {
; .. ^^^^^^^^ (ADDED)
}
; Function Attrs: norecurse nounwind readnone
define internal i32 @bar() unnamed_addr #1 !dbg !1215 !rust !0 {
; .. ^^^^^^^^ (ADDED)
}
; Function Attrs: noinline norecurse nounwind
define internal fastcc i32 @baz() unnamed_addr #2 !dbg !1217 {
; ..
}
; ..
; (ADDED) at the bottom of the file
!0 = !{!"fn", "fn() -> i32"}
; ..
Note how main
and baz
didn't get the extra !rust
metadata because they are
never converted into function pointers. Whereas both foo
and bar
got the
same metadata because they have the same signature and are converted into
function pointers in the source code (lines static F
and F.store
).
When tools parse this LLVM IR they will know that line %4 = ..
can invoke
foo
or bar
(!rust !0
), but not baz
or main
because the latter two
don't have the same "fn" metadata.
This -Z
flag only promises two things with respect to "fn" metadata:
-
Only functions that are converted (coerced) into function pointers in the
source code will get "fn" metadata -- note that this does not necessarily mean that
function will be called via a function pointer call -
That the string node that comes after the
!"fn"
node will be unique for
each function type -- the flag does not make any promise about the contents
or syntax of this string node. (Having a stringified version of the function
signature in the LLVM IR would be nice to have but it's not required to
produce an accurate call graph.)
Adding this kind of metadata doesn't affect LLVM optimization passes and more
importantly our previous experiments show that this custom metadata is not
removed by LLVM passes.
Trait objects
There's one more of bit of information we can encode in the metadata to make the
analysis less pessimistic: information about trait objects.
Consider the following Rust source code and corresponding LLVM IR.
static TO: Mutex<&'static (dyn Foo + Sync)> = Mutex::new(&Bar);
static X: AtomicI32 = AtomicI32::new(0);
fn main() {
(*TO.lock()).foo();
if X.load(Ordering::Relaxed).quux() {
// side effect to keep `quux`'s return value
unsafe { asm!("" : : : "memory" : "volatile") }
}
}
trait Foo {
fn foo(&self) -> bool {
false
}
}
struct Bar;
impl Foo for Bar {}
struct Baz;
impl Foo for Baz {
fn foo(&self) -> bool {
true
}
}
trait Quux {
fn quux(&self) -> bool;
}
impl Quux for i32 {
#[inline(never)]
fn quux(&self) -> bool {
*TO.lock() = &Baz;
unsafe { core::ptr::read_volatile(self) > 0 }
}
}
; Function Attrs: noinline noreturn nounwind
define void @main() unnamed_addr #2 !dbg !1418 {
; ..
%8 = tail call zeroext i1 %7({}* nonnull align 1 %4) #8, !dbg !1437, !rust !0
; .. ^^^^^^^^
}
; app::Foo::foo
; Function Attrs: norecurse nounwind readnone
define internal zeroext i1 @_ZN3app3Foo3foo17h5a849e28d8bf9a2eE(
%Bar* noalias nocapture nonnull readonly align 1
) unnamed_addr #0 !dbg !1224 !rust !1 {
; .. ^^^^^^^^
}
; <app::Baz as app::Foo>::foo
; Function Attrs: norecurse nounwind readnone
define internal zeroext i1
@"_ZN37_$LT$app..Baz$u20$as$u20$app..Foo$GT$3foo17h9e4a36340940b841E"(
%Baz* noalias nocapture nonnull readonly align 1
) unnamed_addr #0 !dbg !1236 !rust !2 {
; .. ^^^^^^^^
}
; <i32 as app::Quux>::quux
; Function Attrs: noinline norecurse nounwind
define internal fastcc zeroext i1
@"_ZN33_$LT$i32$u20$as$u20$app..Quux$GT$4quux17haf5232e76b46052fE"(
i32* noalias readonly align 4 dereferenceable(4)
) unnamed_addr #1 !dbg !1245 !rust !3 {
; .. ^^^^^^^^
}
; ..
!0 = "fn(*mut ()) -> bool"
!1 = "fn(&Bar) -> bool"
!2 = "fn(&Baz) -> bool"
!3 = "fn(&i32) -> bool"
; ..
In this case we have dynamic dispatch, which shows up in the LLVM IR at line
%8
as a function pointer call where the signature of the function pointer is
i1 ({}*)
, which is more or less equivalent to Rust's fn(*mut ()) -> bool
--
the {}*
denotes an "erased" type.
With just the function signature metadata tools could at best assume that the
dynamic dispatch could invoke Bar.foo()
, Baz.foo()
or i32.quux()
resulting
in the following, incorrect call graph.
Thus, we also propose that the -Z call-metadata
flag adds trait-method
information to trait method implementations (IR: define
) of traits that are
converted into trait objects, and to dynamic dispatch sites (IR: call _ %_({}* _, ..)
) using the following metadata syntax: !{!"dyn", !"Foo", !"foo"}
, where
the second node represents the trait and the third node represents the method
being dispatched / defined.
Building upon the previous example, here's how the "dyn" metadata would be
emitted by the compiler:
; Function Attrs: noinline noreturn nounwind
define void @main() unnamed_addr #2 !dbg !1418 {
; ..
%8 = tail call zeroext i1 %7({}* nonnull align 1 %4) #8, !dbg !1437, !rust !0
; ..
}
; app::Foo::foo
; Function Attrs: norecurse nounwind readnone
define internal zeroext i1 @_ZN3app3Foo3foo17h5a849e28d8bf9a2eE(
%Bar* noalias nocapture nonnull readonly align 1
) unnamed_addr #0 !dbg !1224 !rust !0 {
; .. ^^^^^^^^ (CHANGED)
}
; <app::Baz as app::Foo>::foo
; Function Attrs: norecurse nounwind readnone
define internal zeroext i1
@"_ZN37_$LT$app..Baz$u20$as$u20$app..Foo$GT$3foo17h9e4a36340940b841E"(
%Baz* noalias nocapture nonnull readonly align 1
) unnamed_addr #0 !dbg !1236 !rust !0 {
; .. ^^^^^^^^ (CHANGED)
}
; <i32 as app::Quux>::quux
; Function Attrs: noinline norecurse nounwind
define internal fastcc zeroext i1
@"_ZN33_$LT$i32$u20$as$u20$app..Quux$GT$4quux17haf5232e76b46052fE"(
i32* noalias readonly align 4 dereferenceable(4)
) unnamed_addr #1 !dbg !1245 {
; .. ^^^^^^^^ (REMOVED)
}
; ..
!0 = !{!"dyn", !"Foo", !"foo"}" ; CHANGED
; ..
Note that <i32 as Quux>::quux
loses its !rust
metadata because there's no
dyn Quux
in the source code.
With trait-method information tools would be able to limit the candidates of
dynamic dispatch to the actual implementations of the trait being dispatched.
Thus the call graph produced by the tools would become:
Like "fn" metadata, "dyn" metadata only promises two things:
-
Only trait method implementations (including default implementations) of
traits that appear as trait objects (e.g.&dyn Foo
,Box<dyn Bar>
) in the
source code will get this kind of metadata -
That the string nodes that come after the
!"dyn"
node will be unique for
each trait and method -- the flag does not make any promise about the
contents or syntax of these string nodes.
Destructors
Calling the destructor of a trait object (e.g. Box<dyn Foo>
) can result in the
destructor of any Foo
implementer being invoked. This information will also be
encoded in the LLVM IR using "drop" metadata of the form: !{!"drop", !"Foo"}
where the second node represents the trait.
Here's an example of this kind of metadata:
trait Foo {
fn foo(&self) {}
}
struct Bar;
impl Foo for Bar {}
struct Baz;
impl Foo for Baz {}
static MAYBE: AtomicBool = AtomicBool::new(false);
fn main() {
let mut x: Box<dyn Foo> = Box::new(Bar);
if MAYBE.load(Ordering::Relaxed) {
x = Box::new(Baz);
}
drop(x);
}
Unoptimized LLVM IR:
; core::ptr::real_drop_in_place
define internal void @_(%Baz* nonnull align 1) unnamed_addr #4 !rust !199 {
; ..
}
; core::ptr::real_drop_in_place
define internal void @_(%Bar* nonnull align 1) unnamed_addr #4 !rust !199 {
; ..
}
; hello::main
define internal void @() {
; ..
; `drop(x)`
invoke void @_ZN4core3ptr18real_drop_in_place17h258eb03c50ca2fcaE(..)
; ..
}
; core::ptr::real_drop_in_place
define internal void @_ZN4core3ptr18real_drop_in_place17h258eb03c50ca2fcaE(..) {
; ..
; calls destructor on the concrete type behind the trait object
invoke void %8({}* align 1 %3)
to label %bb3 unwind label %cleanup, !dbg !209, !rust !199
; ..
}
!199 = !{!"drop", !"Foo"}
Here dropping x
can result in Bar
's or Baz
's destructor being invoked (see !199
).
Multiple metadata
Some function definitions may get more than one different metadata kind or
different instances of the same kind. In that case we'll use a metadata tuple
(e.g. !{!1, !2}
) to group the different instances. An example:
trait Foo {
fn foo(&self) -> bool;
}
trait Bar {
fn bar(&self) -> i32 {
0
}
}
struct Baz;
impl Foo for Baz {
fn foo(&self) -> bool {
false
}
}
impl Bar for Baz {}
fn main() {
let x: &Foo = &Baz;
let y: &Bar = &Baz;
let z: fn(&Baz) -> bool = Baz::foo;
}
Unoptimized LLVM IR:
; core::ptr::real_drop_in_place
define internal void @_(%Baz* nonnull align 1) unnamed_addr #2 !rust !107 {
; ..
}
; <hello::Baz as hello::Foo>::foo
define internal zeroext i1 @_(%Baz* noalias nonnull readonly align 1) unnamed_addr #2 !rust !157 {
; ..
}
!105 = !{!"drop", !"Foo"}
!106 = !{!"drop", !"Bar"}
!107 = !{!105, !106}
; ..
!155 = !{!"dyn", !"Foo", !"foo"}
!156 = !{!"fn", !"fn(&Baz) -> bool"}
!157 = !{!155, !156}
Summary
In summary, these are the proposed changes:
-
Add an unstable
-Z call-metadata
flag -
Using this flag adds extra LLVM metadata to the LLVM IR produced by
rustc
(--emit=llvm-ir
). Three kinds of metadata will be added:-
!{!"fn", !"fn() -> i32"}
metadata will be added to the definitions of
functions (IR:define
) that are coerced into function pointers in the
source code and to function pointer calls (IR:call _ %_(..)
). The second
node is a string that uniquely identifies the signature (type) of the
function. -
!{!"dyn", !"Trait", !"method"}
metadata will be added to the trait method
implementations (IR:define
) of traits that appear as trait objects in
the source code and to dynamic dispatch sites (IR:call _ %_({}* _, ..)
).
The second node is a string that uniquely identifies the trait and the third
node is a string that uniquely identifies the trait method. -
!{!"drop", "Trait"}
metadata will be added to destructors (IR:define
)
of types that implement traits that appear as trait objects in the source
code and to the invocations of trait object destructors. The second node is
a string that uniquely identifies the implemented trait / trait object.
-
Alternatives
An alternative would be to make type information available in the final binary
artifact, that is in the executable, rather than in the LLVM IR. This would make
the feature harder to implement and less portable. Making the type information
available in, say, the ELF format would require designing a (binary) format
to encode the information in a linker section plus non-trivial implementation
work. Making this feature available in other formats (Mach-O, WASM, etc.) would
only multiply the amount of required work, likely leading to this feature being
implemented for some formats but not others.
Drawbacks
LLVM IR is tied to the LLVM backend; this makes the proposed feature hard, or
maybe even impossible, to port to other backends like cranelift. I don't think
this is much of an issue as this is an experimental feature; backend portability
can, and should, be revisited when we consider stabilizing this feature (if
ever).
Since this is a (hopefully small) experimental compiler feature (along the lines
of -Z emit-stack-sizes
) and not a language (semantics or syntax)
change I'm posting this in rust-lang/rust for FCP consideration. If this
warrants a formal RFC I'd be happy to repost this in rust-lang/rfcs.
cc @rust-lang/compiler @oli-obk
Activity