Skip to content

Commit

Permalink
Documentation for local allocations
Browse files Browse the repository at this point in the history
  • Loading branch information
stedolan committed Apr 25, 2022
1 parent b05519f commit 81dd85e
Show file tree
Hide file tree
Showing 3 changed files with 874 additions and 0 deletions.
155 changes: 155 additions & 0 deletions jane/doc/local-intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# Introduction to Local Allocations


Instead of allocating values normally on the GC heap, local
allocations allow you to stack-allocate values using the new `local_`
keyword:

let local_ x = { foo; bar } in
...

or equivalently, by putting the keyword on the expression itself:

let x = local_ { foo; bar } in
...

To enable this feature, you need to pass the `-extension local` flag
to the compiler. Without this flag, `local_` is not recognized as a
keyword, and no local allocations will be performed.

These values live on a separate stack, and are popped off at the end
of the _region_. Generally, the region ends when the surrounding
function returns, although read [the reference](local-reference.md) for more
details.

This helps performance in a couple of ways: first, the same few hot
cachelines are constantly reused, so the cache footprint is lower than
usual. More importantly, local allocations will never trigger a GC,
and so they're safe to use in low-latency code that must currently be
zero-alloc.

However, for this to be safe, local allocations must genuinely be
local. Since the memory they occupy is reused quickly, we must ensure
that no dangling references to them escape. This is checked by the
typechecker, and you'll see new error messages if local values leak:

# let local_ thing = { foo; bar } in
some_global := thing;;
^^^^^
Error: This value escapes its region


Most of the types of allocation that OCaml does can be locally
allocated: tuples, records, variants, closures, boxed numbers,
etc. Local allocations are also possible from C stubs, although this
requires code changes to use the new `caml_alloc_local` instead of
`caml_alloc`. A few types of allocation cannot be locally allocated,
though, including first-class modules, classes and objects, and
exceptions. The contents of mutable fields (inside `ref`s, `array`s
and mutable record fields) also cannot be locally allocated.


## Local parameters

Generally, OCaml functions can do whatever they like with their
arguments: use them, return them, capture them in closures or store
them in globals, etc. This is a problem when trying to pass around
locally-allocated values, since we need to guarantee they do not
escape.

The remedy is that we allow the `local_` keyword to also appear on function parameters:

let f (local_ x) = ...

A local parameter is a promise by a function not to let a particular
argument escape its region. In the body of f, you'll get a type error
if x escapes, but when calling f you can freely pass local values as
the argument. This promise is visible in the type of f:

val f : local_ 'a -> ...

The function f may be equally be called with locally-allocated or
GC-heap values: the `local_` annotation places obligations only on the
definition of f, not its uses.

Even if you're not interested in performance benefits, local
parameters are a useful new tool for structuring APIs. For instance,
consider a function that accepts a callback, to which it passes some
mutable value:

let uses_callback ~f =
let tbl = Foo.Table.create () in
fill_table tbl;
let result = f tbl in
add_table_to_global_registry tbl;
result

Part of the contract of `uses_callback` is that it expects `f` not to
capture its argument: unexpected results could ensue if `f` stored a
reference to this table somewhere, and it was later used and modified
after it was added to the global registry. Using `local_`
annotations allows this constraint to be made explicit and checked at
compile time, by giving `uses_callback` the signature:

val uses_callback : f:(local_ int Foo.Table.t -> 'a) -> 'a


## Inference

The examples above use the local_ keyword to mark local
allocations. In fact, this is not necessary, and the compiler will
use local allocations by default where possible, as long as the
`-extension local` flag is enabled.

The only effect of the keyword on e.g. a let binding is to change the
behavior for escaping values: if the bound value looks like it escapes
and therefore cannot be locally allocated, then without the keyword
the compiler will allocate this value on the GC heap as usual, while
with the keyword it will instead report an error.

Inference can even determine whether parameters are local, which is
useful for helper functions. It's less useful for toplevel functions,
though, as whether their parameters are local is generally forced by
their signature in the mli file, where no inference is performed.

Inference does not work across files: if you want e.g. to pass a local
argument to a function in another module, you'll need to explicitly
mark the local parameter in the other module's mli.




## More control

There are a number of other features that allow more precise control
over which values are locally allocated, including:

- **Local closures**:

```
let local_ f a b c = ...
```
defines a function `f` whose closure is itself locally allocated.
- **Local-returning functions**
```
let f a b c = local_
...
```
defines a function `f` which returns local allocations into its
caller's region.
- **Global fields**
```
type 'a t = { global_ g : 'a }
```
defines a record type `t` whose `g` field is always known to be on
the GC heap (and may therfore freely escape regions), even though
the record itself may be locally allocated.
For more details, read [the reference](./local-reference.md).
78 changes: 78 additions & 0 deletions jane/doc/local-pitfalls.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Some Pitfalls of Local Allocations

This document outlines some common pitfalls that may come up when
trying out local allocations in a new codebase, as well as some
suggested workarounds. Over time, this list may grow (as experience
discovers new things that go wrong) or shrink (as we deploy new
compiler versions that ameliorate some issues).


## Tail calls

Many OCaml functions just happen to end in a tail call, even those
that are not intentionally tail-recursive. To preserve the
constant-space property of tail calls, the compiler applies special
rules around local allocations in tail calls (see [the
reference](./local-reference.md)).

If this causes a problem for calls that just happen to be in tail
position, the easiest workaround is to prevent them from being
treated as tail calls by moving them, replacing:

func arg1 arg2

with

let res = func arg1 arg2 in res

With this version, local values used in `fun arg1 arg2` will be freed
after `func` returns.

## Partial applications with local parameters

To enable the use of local allocations with higher-order functions, a
necessary step is to add local annotations to function types,
particularly those of higher-order functions. For instance, an `iter`
function may become:

val iter : 'a list -> f:local_ ('a -> unit) -> unit

thus allowing locally-allocated closures `f` to be used.

However, this is unfortunately not an entirely backwards-compatible
change. The problem is that partial applications of `iter` functions
with the new type are themselves locally allocated, because they close
over the possibly-local `f`. This means in particular that partial
applications will no longer be accepted as module-level definitions:

let print_each_foo = iter ~f:(print_foo)

The fix in these cases is to expand the partial application to a full
application by introducing extra arguments:

let print_each_foo x = iter ~f:(print_foo) x

## Typing of (@@) and (|>)

The typechecking of (@@) and (|>) changed slightly with the local
allocations typechecker, in order to allow them to work with both
local and nonlocal arguments. The major difference is that:

f x @@ y
y |> f x
f x y

are now all typechecked in exactly the same way. Previously, the
first two were typechecked differently, as an application of an
operator to the expressions `f x` and `y`, rather than a single
application with two arguments.

This affects which expressions are in "argument position", which can
have a subtle effect on when optional arguments are given their
default values. If this affects you (which is extremely rare), you
will see type errors involving optional parameters, and you can
restore the old behaviour by removing the use of `(@@)` or `(|>)` and
parenthesizing their subexpressions. That is, the old typing behaviour
of `f x @@ y` is available as:

(f x) y
Loading

0 comments on commit 81dd85e

Please sign in to comment.