Description
in #109982 rustc switched to -Z plt=yes
on non-x86-64 platforms for a bunch of good reasons. and stuck with -Z plt=no
by default on x86-64 for also good reasons! unfortunately, defaulting to -Z plt=no
is a slight pessimization in programs heavily dependent on calls into statically linked libraries.
PLT calls on x86 end up compiled to e8 <addr>
calls, which at link time can be rewritten to direct calls to the callee, and presumably deletion of the GOT entry. when we skip the PLT on x86-64, it seems that linkers are unwilling to do a link-time optimization of ff 15 <GOT addr>
into 90 e8 <fn addr>
when the callee is local to the object, so an indirect call to the object-local persists*.
i expect -Z plt=no
to be better than -Z plt=yes
on x86-64 for all cases where the called functions are dynamically linked. i also expect -Z plt=no
to be worse than -Z plt=yes
on x86-64 for all cases where the called functions are statically linked and <4 GiB from their call sites. it'd be nice if we could skip non_lazy_bind
if we know the called function is to be statically linked. if your compiled artifact is >4 GiB .. i've heard of such things but have no idea what's best :)
"if we know the called function is to be statically linked" is the more annoying problem, though, because rustc-link-lib
tells rustc only what libraries get what kind of linkage. especially on Unix-y platforms we don't know which of those platforms will provide a given symbol. the extern block can have a #[link(kind="static")]
attribute which i've used in this minimized example of the problem i'm talking about, which almost seems like enough information to choose when to do this optimization at codegen-time. unfortunately, if the source file says #[link(name="util", kind="static")] extern "C" { pub fn foo(); }
, and then you compile that source like rustc -l dylib=util ...
, the command-line parameter simply overrides the link attribute and you end up with a dynamic link to foo
with the (in context) reasonable ff 15 [GOT_entry]
call.
because of the #[link]
/-l KIND=NAME
interaction i'm really not sure what to do here. i was going to initially suggest plumbing #[link(kind="static")]
through to inform if nonlazybind
is appropriate, but i had expected that conflicting link directives would at least produce an error. silently ending up with the command line argument is pretty unfortunate. does it seem reasonable to plumb the #[link]
attribute as a hint, advise #[link(kind="static")]
for statically linked functions, and make conflicting #[link]
and -l
arguments produce an error?
memorysafety/rav1d#1417 is a more substantive case which motivates this issue, where hot code is a collection of assembly routines that are statically linked. i've written a longer analysis about the case in that issue, but it's just more supporting information around the observation above.
worse, for code that is hot around an indirect call to a constant target, branch prediction quite effectively hides the cost of this indirect call. if the hot code is more like a large region of warm code, the branch prediction can end up evicted and these indirect calls to a constant local function become quite costly.
worse (pt2), LLVM reasonably tries to improve the indirect call situation by hoisting loads to repeated calls of the same target, which can cause register pressure, additional spills, generally make this kind of unfortunate situation even worse.