Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Allow changing the default allocator #1183

Merged
merged 1 commit into from
Jul 29, 2015
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
235 changes: 235 additions & 0 deletions text/0000-swap-out-jemalloc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
- Feature Name: `allocator`
- Start Date: 2015-06-27
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary

Add support to the compiler to override the default allocator, allowing a
different allocator to be used by default in Rust programs. Additionally, also
switch the default allocator for dynamic libraries and static libraries to using
the system malloc instead of jemalloc.

# Motivation

Note that this issue was [discussed quite a bit][babysteps] in the past, and
the meat of this RFC draws from Niko's post.

[babysteps]: http://smallcultfollowing.com/babysteps/blog/2014/11/14/allocators-in-rust/

Currently all Rust programs by default use jemalloc for an allocator because it
is a fairly reasonable default as it is commonly much faster than the default
system allocator. This is not desirable, however, when embedding Rust code into
other runtimes. Using jemalloc implies that Rust will be using one allocator
while the host application (e.g. Ruby, Firefox, etc) will be using a separate
allocator. Having two allocators in one process generally hurts performance and
is not recommended, so the Rust toolchain needs to provide a method to configure
the allocator.

In addition to using an entirely separate allocator altogether, some Rust
programs may want to simply instrument allocations or shim in additional
functionality (such as memory tracking statistics). This is currently quite
difficult to do, and would be accomodated with a custom allocation scheme.

# Detailed design

The high level design can be found [in this gist][gist], but this RFC intends to
expound on the idea to make it more concrete in terms of what the compiler
implementation will look like. A [sample implementaiton][impl] is available of
this section.

[gist]: https://gist.github.com/alexcrichton/41c6aad500e56f49abda
[impl]: https://github.com/alexcrichton/rust/tree/less-jemalloc

### High level design

The design of this RFC from 10,000 feet (referred to below), which was
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use SI, imperial is deprecated everywhere except a certain country 😉.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. 3.048 km should be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aviation still uses feet for altitude in most of the world, so let the idiom be! 😛

[previously outlined][gist] looks like:

1. Define a set of symbols which correspond to the APIs specified in
`alloc::heap`. The `liballoc` library will call these symbols directly.
Note that this means that each of the symbols take information like the size
of allocations and such.
2. Create two shim libraries which implement these allocation-related functions.
Each shim is shipped with the compiler in the form of a static library. One
shim will redirect to the system allocator, the other shim will bundle a
jemalloc build along with Rust shims to redirect to jemalloc.
3. Intermediate artifacts (rlibs) do not resolve this dependency, they're just
left dangling.
4. When producing a "final artifact", rustc by default links in one of two
shims:
* If we're producing a staticlib or a dylib, link the system shim.
* If we're producing an exe and all dependencies are rlibs link the
jemalloc shim.

The final link step will be optional, and one could link in any compliant
allocator at that time if so desired.

### New Attributes

Two new **unstable** attributes will be added to the compiler:

* `#![needs_allocator]` indicates that a library requires the "allocation
symbols" to link successfully. This attribute will be attached to `liballoc`
and no other library should need to be tagged as such. Additionally, most
crates don't need to worry about this attribute as they'll transitively link
to liballoc.
* `#![allocator]` indicates that a crate is an allocator crate. This is
currently also used for tagging FFI functions as an "allocation function"
to leverage more LLVM optimizations as well.

All crates implementing the Rust allocation API must be tagged with
`#![allocator]` to get properly recognized and handled.

### New Crates

Two new **unstable** crates will be added to the standard distribution:

* `alloc_system` is a crate that will be tagged with `#![allocator]` and will
redirect allocation requests to the system allocator.
* `alloc_jemalloc` is another allocator crate that will bundle a static copy of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#![allocator] instead of allocator would be less confusing (I wasn't sure if it was implied that it would not have the tag)

jemalloc to redirect allocations to.

Both crates will be available to link to manually, but they will not be
available in stable Rust to start out.

### Allocation functions

Each crate tagged `#![allocator]` is expected to provide the full suite of
allocation functions used by Rust, defined as:

```rust
extern {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must it be C ABI?

I’d rather have something #[lang]-ish here as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C ABI is not required, but leaves the door open to allowing external implementations of an allocator in the future (e.g. implementing one in C instead of Rust).

I discussed #[lang] above which may be of interest as well.

fn __rust_allocate(size: usize, align: usize) -> *mut u8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using magic symbol names instead of annotation-tagged functions a la #[lang_item="foo"] or #[plugin_registrar]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation-wise, this is what everything will boil down to (pre-defined symbols), and this is currently the path of least resistance forward. This is all unstable, however, so we'll definitely be able to change it in the future to perhaps using lang items or more official attributes. The current downside of attributes are:

  • During a compilation, there may actually be two loaded allocators in the crate store (but we won't link one of them), so the compiler would detect duplicate lang items and yield an error. Extra logic would have to be added to "not worry about" the allocator lang items.
  • None of the signatures are currently typechecked, and having an official attribute makes it feel like it should be typechecked.

Basically I'd love to move to using attributes and such, but I don't see much immediate benefit over just defining some symbols in the short-term. I also don't mind adding some words to this effect in the RFC, though, and we could perhaps spec the "ideal implementation" here where the actual implementation just has some TODOs.

My ideal situation would be to have an attribute-per-function which defines the symbol, visibility, and typechecks the signature. We'd then also have a check that an #![allocator] crate contains the necessary functions (tagged with attributes). That's a good deal of attribute-surface-area to start stabilizing right off the bat though.

fn __rust_deallocate(ptr: *mut u8, old_size: usize, align: usize);
fn __rust_reallocate(ptr: *mut u8, old_size: usize, size: usize,
align: usize) -> *mut u8;
fn __rust_reallocate_inplace(ptr: *mut u8, old_size: usize, size: usize,
align: usize) -> usize;
fn __rust_usable_size(size: usize, align: usize) -> usize;
}
```

The exact API of all these symbols is considered **unstable** (hence the
leading `__`). This otherwise currently maps to what `liballoc` expects today.
The compiler will not currently typecheck `#![allocator]` crates to ensure
these symbols are defined and have the correct signature.

Also note that to define the above API in a Rust crate it would look something
like:

```rust
#[no_mangle]
pub extern fn __rust_allocate(size: usize, align: usize) -> *mut u8 {
/* ... */
}
```

### Limitations of `#![allocator]`

Allocator crates (those tagged with `#![allocator]`) are not allowed to
transitively depend on a crate which is tagged with `#![needs_allocator]`. This
would introduce a circular dependency which is difficult to link and is highly
likely to otherwise just lead to infinite recursion.

The compiler will also not immediately verify that crates tagged with
`#![allocator]` do indeed define an appropriate allocation API, and vice versa
if a crate defines an allocation API the compiler will not verify that it is
tagged with `#![allocator]`. This means that the only meaning `#![allocator]`
has to the compiler is to signal that the default allocator should not be
linked.

### Default allocator specifications

Target specifications will be extended with two keys: `lib_allocation_crate`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

and `exe_allocation_crate`, describing the default allocator crate for these
two kinds of artifacts for each target. The compiler will by default have all
targets redirect to `alloc_system` for both scenarios, but `alloc_jemalloc` will
be used for binaries on OSX, Bitrig, DragonFly, FreeBSD, Linux, OpenBSD, and GNU
Windows. MSVC will notably **not** use jemalloc by default for binaries (we
don't currently build jemalloc on MSVC).

### Injecting an allocator

As described above, the compiler will inject an allocator if necessary into the
current compilation. The compiler, however, cannot blindly do so as it can
easily lead to link errors (or worse, two allocators), so it will have some
heuristics for only injecting an allocator when necessary. The steps taken by
the compiler for any particular compilation will be:

* If no crate in the dependency graph is tagged with `#![needs_allocator]`, then
the compiler does not inject an allocator.
* If only an rlib is being produced, no allocator is injected.
* If any crate tagged with `#[allocator]` has been explicitly linked to (e.g.
via an `extern crate` statement directly or transitively) then no allocator is
injected.
* If two allocators have been linked to explicitly an error is generated.
* If only a binary is being produced, then the target's `exe_allocation_crate`
value is injected, otherwise the `lib_allocation_crate` is injected.

The compiler will also record that the injected crate is injected, so later
compilations know that rlibs don't actually require the injected crate at
runtime (allowing it to be overridden).

### Allocators in practice

Most libraries written in Rust wouldn't interact with the scheme proposed in
this RFC at all as they wouldn't explicitly link with an allocator and generally
are compiled as rlibs. If a Rust dynamic library is used as a dependency, then
its original choice of allocator is propagated throughout the crate graph, but
this rarely happens (except for the compiler itself, which will continue to use
jemalloc).

Authors of crates which are embedded into other runtimes will start using the
system allocator by default with no extra annotation needed. If they wish to
funnel Rust allocations to the same source as the host application's allocations
then a crate can be written and linked in.

Finally, providers of allocators will simply provide a crate to do so, and then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add text to this section (either in this paragraph or in a separate one) spelling out how a client who wants to provide a wrapper around Rust's default allocator (or otherwise instrument it) would do so?

This use case was alluded to, at the end of the motivation section, but I am not 100% clear on how arduous the process will be, in particular whether one will be confident that the allocator one is injecting is truly a wrapper around the allocator that Rust would have selected otherwise (that is, without the injection)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(if the answer is "It is indeed a bit arduous to write such a wrapper robustly, e.g. involving cfg switches to select properly between alloc_system and alloc_jemalloc in the alloc crate one is injecting, that is acceptable. I just want to know up front if that is the expectation.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(its also possible that the answer involves somehow observing the values of lib_allocation_crate and exe_allocation_crate during the compilation of the crate I want to inject, and just assume they will stay the same at the time of the final link where I am being injected? Still wondering out loud; probably should just wait for @alexcrichton to answer...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this RFC doesn't currently easily allow this sort of instrumentation to happen. If we wanted to support this right out of the gate, this RFC would necessitate four crates:

  • Two crates for implementing the allocation API, but not tagged with #![allocator]. There'd be one crate for jemalloc and one for the system.
  • Two crates for linking to the previous crates, but are tagged with #![allocator] and redirect the formal allocation API into the desired crate.

In a nutshell, if you want to write an allocator which can be instrumented, or shimmed then you need to write a crate which is not tagged #![allocator] but probably still exposes the allocation API via normal Rust functions. The provider of the allocator would then write their own shims that redirect to the allocator desired after the instrumentation has happened.

Does that make sense? If so I'll add some words.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I missed this response back when it was written.

I guess I would have liked for some more concrete details in the RFC regarding use cases like this, i.e. spelling out what the steps are for the expected uses of this RFC, and then also including little sketches like the one in your comment for unexpected use cases.

Anyway I plan to have a shot at playing around with the PR rust-lang/rust#27400 since I am finding myself needing to do some allocation debugging. Perhaps it will inspire me to write an amendment for the RFC with such notes.

applications and/or libraries can make explicit use of the allocator by
depending on it as usual.

# Drawbacks

A significant amount of API surface area is being added to the compiler and
standard distribution as part of this RFC, but it is possible for it to all
enter as `#[unstable]`, so we can take our time stabilizing it and perhaps only
stabilize a subset over time.

The limitation of an allocator crate not being able to link to the standard
library (or libcollections) may be a somewhat significant hit to the ergonomics
of defining an allocator, but allocators are traditionally a very niche class of
library and end up defining their own data structures regardless.

Libraries on crates.io may accidentally link to an allocator and not actually
use any specific API from it (other than the standard allocation symbols),
forcing transitive dependants to silently use that allocator.

This RFC does not specify the ability to swap out the allocator via the command
line, which is certainly possible and sometimes more convenient than modifying
the source itself.

It's possible to define an allocator API (e.g. define the symbols) but then
forget the `#![allocator]` annotation, causing the compiler to wind up linking
two allocators, which may cause link errors that are difficult to debug.

# Alternatives

The compiler's knowledge about allocators could be simplified quite a bit to the
point where a compiler flag is used to just turn injection on/off, and then it's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a way to go, similar to how we have #![no_std] and #![no_main]. #![no_allocator] would blend in quite well, after which you’d just have to define your own allocation language items. A custom allocator then could be chosen by using something similar to

#[allocator]
extern crate my_awesome_allocator;

which is also pretty similar to how you’d use a custom standard library.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between no_main and no_std, however, is that choosing an allocator is a global decision, not a local one. There's a number of bad use cases you can get into when dealing with the compiler otherwise, for example if you're linking to a Rust dynamic library, then it had to have an allocator defined when it was linked, so you have no choice but to use that, yet you can still happily link to your own.

I put this alternative here mostly as a gut feeling rather than having anything concrete in mind. On the surface, though, this RFC proposes basically 0 overhead on consumer crates of allocators (they do nothing or otherwise just have one extern crate statement). Some extra error messages may pop up here and there, but very little is actually changing about how an allocator is used.

Do you have some specific aspects of this RFC you feel are too ambitious?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between no_main and no_std, however, is that choosing an allocator is a global decision, not a local one.

The compiler, on the other hand has the full power to propagate the top-most choice down the dependency chain (except, of course, when staticlibs or dylibs are encountered; I actually maintain a viewpoint that neither of these should have allocator built-in and producer of the final executable should link all the appropriate allocator libraries instead), no?

Do you have some specific aspects of this RFC you feel are too ambitious?

Rather than ambitious, to me at the first sight the alternative looks like a more elegant solution, that’s it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not have much concrete in mind when I wrote this alternative, and I'm not quite sure what you're thinking of, so could you go into more detail about how you would envision "dumbing down" the compiler's knowlege of allocators?

I actually maintain a viewpoint that neither of these should have allocator built-in and producer of the final executable should link all the appropriate allocator libraries instead

Unfortunately a dynamic library will not link on Windows unless all symbols are resolved (unlike linux where you can have unresolved symbols in a dynamic library)

the responsibility of the application to define the necessary symbols if the
flag is turned off. The current implementation of this RFC, however, is not seen
as overly invasive and the benefits of "everything's just a crate" seems worth
it for the mild amount of complexity in the compiler.

Many of the names (such as `alloc_system`) have a number of alternatives, and
the naming of attributes and functions could perhaps follow a stronger
convention.

# Unresolved questions

Does this enable jemalloc to be built without a prefix on Linux? This would
enable us to direct LLVM allocations to jemalloc, which would be quite nice!

Should BSD-like systems use Rust's jemalloc by default? Many of them have
jemalloc as the system allocator and even the special APIs we use from jemalloc.