From 81dd85ee19295a8c274836c1eb8fef5f8ab924dd Mon Sep 17 00:00:00 2001 From: Stephen Dolan Date: Mon, 25 Apr 2022 12:27:48 +0100 Subject: [PATCH] Documentation for local allocations --- jane/doc/local-intro.md | 155 +++++++++ jane/doc/local-pitfalls.md | 78 +++++ jane/doc/local-reference.md | 641 ++++++++++++++++++++++++++++++++++++ 3 files changed, 874 insertions(+) create mode 100644 jane/doc/local-intro.md create mode 100644 jane/doc/local-pitfalls.md create mode 100644 jane/doc/local-reference.md diff --git a/jane/doc/local-intro.md b/jane/doc/local-intro.md new file mode 100644 index 00000000000..b2fb8d4fe2c --- /dev/null +++ b/jane/doc/local-intro.md @@ -0,0 +1,155 @@ +# Introduction to Local Allocations + + +Instead of allocating values normally on the GC heap, local +allocations allow you to stack-allocate values using the new `local_` +keyword: + + let local_ x = { foo; bar } in + ... + +or equivalently, by putting the keyword on the expression itself: + + let x = local_ { foo; bar } in + ... + +To enable this feature, you need to pass the `-extension local` flag +to the compiler. Without this flag, `local_` is not recognized as a +keyword, and no local allocations will be performed. + +These values live on a separate stack, and are popped off at the end +of the _region_. Generally, the region ends when the surrounding +function returns, although read [the reference](local-reference.md) for more +details. + +This helps performance in a couple of ways: first, the same few hot +cachelines are constantly reused, so the cache footprint is lower than +usual. More importantly, local allocations will never trigger a GC, +and so they're safe to use in low-latency code that must currently be +zero-alloc. + +However, for this to be safe, local allocations must genuinely be +local. Since the memory they occupy is reused quickly, we must ensure +that no dangling references to them escape. This is checked by the +typechecker, and you'll see new error messages if local values leak: + + # let local_ thing = { foo; bar } in + some_global := thing;; + ^^^^^ + Error: This value escapes its region + + +Most of the types of allocation that OCaml does can be locally +allocated: tuples, records, variants, closures, boxed numbers, +etc. Local allocations are also possible from C stubs, although this +requires code changes to use the new `caml_alloc_local` instead of +`caml_alloc`. A few types of allocation cannot be locally allocated, +though, including first-class modules, classes and objects, and +exceptions. The contents of mutable fields (inside `ref`s, `array`s +and mutable record fields) also cannot be locally allocated. + + +## Local parameters + +Generally, OCaml functions can do whatever they like with their +arguments: use them, return them, capture them in closures or store +them in globals, etc. This is a problem when trying to pass around +locally-allocated values, since we need to guarantee they do not +escape. + +The remedy is that we allow the `local_` keyword to also appear on function parameters: + + let f (local_ x) = ... + +A local parameter is a promise by a function not to let a particular +argument escape its region. In the body of f, you'll get a type error +if x escapes, but when calling f you can freely pass local values as +the argument. This promise is visible in the type of f: + + val f : local_ 'a -> ... + +The function f may be equally be called with locally-allocated or +GC-heap values: the `local_` annotation places obligations only on the +definition of f, not its uses. + +Even if you're not interested in performance benefits, local +parameters are a useful new tool for structuring APIs. For instance, +consider a function that accepts a callback, to which it passes some +mutable value: + + let uses_callback ~f = + let tbl = Foo.Table.create () in + fill_table tbl; + let result = f tbl in + add_table_to_global_registry tbl; + result + +Part of the contract of `uses_callback` is that it expects `f` not to +capture its argument: unexpected results could ensue if `f` stored a +reference to this table somewhere, and it was later used and modified +after it was added to the global registry. Using `local_` +annotations allows this constraint to be made explicit and checked at +compile time, by giving `uses_callback` the signature: + + val uses_callback : f:(local_ int Foo.Table.t -> 'a) -> 'a + + +## Inference + +The examples above use the local_ keyword to mark local +allocations. In fact, this is not necessary, and the compiler will +use local allocations by default where possible, as long as the +`-extension local` flag is enabled. + +The only effect of the keyword on e.g. a let binding is to change the +behavior for escaping values: if the bound value looks like it escapes +and therefore cannot be locally allocated, then without the keyword +the compiler will allocate this value on the GC heap as usual, while +with the keyword it will instead report an error. + +Inference can even determine whether parameters are local, which is +useful for helper functions. It's less useful for toplevel functions, +though, as whether their parameters are local is generally forced by +their signature in the mli file, where no inference is performed. + +Inference does not work across files: if you want e.g. to pass a local +argument to a function in another module, you'll need to explicitly +mark the local parameter in the other module's mli. + + + + +## More control + +There are a number of other features that allow more precise control +over which values are locally allocated, including: + + - **Local closures**: + + ``` + let local_ f a b c = ... + ``` + + defines a function `f` whose closure is itself locally allocated. + + - **Local-returning functions** + + ``` + let f a b c = local_ + ... + ``` + + defines a function `f` which returns local allocations into its + caller's region. + + - **Global fields** + + ``` + type 'a t = { global_ g : 'a } + ``` + + defines a record type `t` whose `g` field is always known to be on + the GC heap (and may therfore freely escape regions), even though + the record itself may be locally allocated. + +For more details, read [the reference](./local-reference.md). diff --git a/jane/doc/local-pitfalls.md b/jane/doc/local-pitfalls.md new file mode 100644 index 00000000000..b51bbc1f989 --- /dev/null +++ b/jane/doc/local-pitfalls.md @@ -0,0 +1,78 @@ +# Some Pitfalls of Local Allocations + +This document outlines some common pitfalls that may come up when +trying out local allocations in a new codebase, as well as some +suggested workarounds. Over time, this list may grow (as experience +discovers new things that go wrong) or shrink (as we deploy new +compiler versions that ameliorate some issues). + + +## Tail calls + +Many OCaml functions just happen to end in a tail call, even those +that are not intentionally tail-recursive. To preserve the +constant-space property of tail calls, the compiler applies special +rules around local allocations in tail calls (see [the +reference](./local-reference.md)). + +If this causes a problem for calls that just happen to be in tail +position, the easiest workaround is to prevent them from being +treated as tail calls by moving them, replacing: + + func arg1 arg2 + +with + + let res = func arg1 arg2 in res + +With this version, local values used in `fun arg1 arg2` will be freed +after `func` returns. + +## Partial applications with local parameters + +To enable the use of local allocations with higher-order functions, a +necessary step is to add local annotations to function types, +particularly those of higher-order functions. For instance, an `iter` +function may become: + + val iter : 'a list -> f:local_ ('a -> unit) -> unit + +thus allowing locally-allocated closures `f` to be used. + +However, this is unfortunately not an entirely backwards-compatible +change. The problem is that partial applications of `iter` functions +with the new type are themselves locally allocated, because they close +over the possibly-local `f`. This means in particular that partial +applications will no longer be accepted as module-level definitions: + + let print_each_foo = iter ~f:(print_foo) + +The fix in these cases is to expand the partial application to a full +application by introducing extra arguments: + + let print_each_foo x = iter ~f:(print_foo) x + +## Typing of (@@) and (|>) + +The typechecking of (@@) and (|>) changed slightly with the local +allocations typechecker, in order to allow them to work with both +local and nonlocal arguments. The major difference is that: + + f x @@ y + y |> f x + f x y + +are now all typechecked in exactly the same way. Previously, the +first two were typechecked differently, as an application of an +operator to the expressions `f x` and `y`, rather than a single +application with two arguments. + +This affects which expressions are in "argument position", which can +have a subtle effect on when optional arguments are given their +default values. If this affects you (which is extremely rare), you +will see type errors involving optional parameters, and you can +restore the old behaviour by removing the use of `(@@)` or `(|>)` and +parenthesizing their subexpressions. That is, the old typing behaviour +of `f x @@ y` is available as: + + (f x) y diff --git a/jane/doc/local-reference.md b/jane/doc/local-reference.md new file mode 100644 index 00000000000..e8f85155331 --- /dev/null +++ b/jane/doc/local-reference.md @@ -0,0 +1,641 @@ +# Local Allocations Reference + +The goal of this document is to be a reasonably complete reference to local +allocations in OCaml. For a gentler introduction, see [the +introduction](local-intro.md). + +When local allocations are enabled with the `-extension local` flag, the +compiler may locally allocate some values, placing them on a stack rather than +the garbage collected heap. Instead of waiting for the next GC, the memory used +by locally allocated values is reclaimed when their _region_ (see below) ends, and +can be immediately reused. Whether the compiler locally allocates certain values +is controlled using a new keyword currently spelled `local_`, whose effects in +expressions, patterns and types are explained below. + + +## Local expressions and allocation + +The `local_` keyword may be placed on an expression to indicate that +allocations in that expression should be locally allocated: + + let abc = local_ [a; b; c] in + ... + +Here, the three cons cells of the list `[a; b; c]` will all be locally +allocated. + +Equivalently, the keyword `local_` may precede the pattern in a `let`: + + let local_ abc = [a; b; c] in + ... + +Locally allocated values may reference global (that is, GC-allocated or +constant) values, but global values may not reference local ones. In the +example above, any or all of `a`, `b` and `c` may themselves be locally +allocated. + +It is valid for an expression annotated `local_` to still yield a global value. +For instance, if there is a global `x : int list` in scope, then this is +allowed: + + let l = local_ if n > 0 then n :: x else x in + ... + +Here, if `n > 0`, then `l` will be a locally-allocated cons cell. However, if +`n <= 0`, then `l` will be `x`, which is global. In other words, the `local_` +keyword on an expression permits but does not oblige that expression to locally +allocate its result. + +Most OCaml types can be locally allocated, including records, variants, +polymorphic variants, closures, boxed numbers and strings. However, certain +values cannot be locally allocated, and will always be on the GC heap, +including: + + - Modules (including first-class modules) + + - Exceptions + (Technically, values of type `exn` can be locally allocated, but only global ones may be raised) + + - Classes and objects + +In addition, any value that is to be put into a mutable field (for example +inside a `ref`, an `array` or a mutable record) cannot be locally allocated. + + +## Inference + +In fact, the allocations of the examples above will be locally +allocated even without the `local_` keyword, if it is safe to do so +(and the `-extension local` flag is enabled). The presence of the +keyword on an expression only affects what happens if the value +escapes (e.g. is stored into a global hashtable) and therefore cannot +be locally allocated. With the keyword, an error will be reported, +while without the keyword the allocations will occur on the GC heap as +usual. + +Inference does not cross file boundaries. If local annotations subject to +inference appear in the type of a module (e.g. since they can appear in +function types, see below) then inference will resolve them according to what +appears in the `.mli`. If there is no `.mli` file, then inference will always +choose `global` for anything that can be accessed from another file. + +## Regions + +Every local allocation takes places inside a _region_, which is a block of code +(usually a function body, but see below). At the end of a region, all of its +local allocations are freed. + +Regions may nest, for instance when one function calls another. Local +allocations always occur in the innermost (most recent) region. + +We say that a value _escapes_ a region if it is still referenced beyond the end +of that region. The job of the typechecker is to ensure that locally allocated +values do not escape the region they were allocated in. + +"Region" is a wider concept than "scope", and locally-allocated variables can +outlive their scope. For example: + + let f () = + let local_ counter = + let local_ r = ref 42 in + incr r; + r + in + ... + +The locally-allocated reference `r` is allocated inside the definition of +`counter`. This value outlives the scope of `r` (it is bound to the variable +`counter` and may later be used in the code marked `...`). However, the +typechecker ensures that it does not outlive the region in which it is +allocated, which is the entire body of `f`. + +As well as function bodies, a region is also placed around: + + - Loop bodies (`while` and `for`) + - Lazy expressions (`lazy ...`) + - Module bindings (`let x = ...` at module level, including in submodules) + +Module bindings are wrapped in regions to enforce the rule (as mentioned above) +that modules never contain locally-allocated values. + +Additionally, it is possible to write functions that do *not* have +a region around their body, which is useful to write functions that +return locally-allocated values. See "Local-returning functions" below. + +### Runtime behaviour + +At runtime, local allocations do not allocate on the C stack, but on a +separately-allocated stack that follows the same layout as the OCaml +minor heap. In particular, this allows local-returning functions +without the need to copy returned values. + +The beginning of a region records the stack pointer of this local +stack, and the end of the region resets the stack pointer to this +value. + + +### Variables and regions + +To spot escaping local allocations, the type checker internally tracks whether +each variable is: + + - **Global**: must be a global value. These variables are allowed to freely + cross region boundaries, as normal OCaml values. + + - **Local**: may be a locally-allocated value. These variables are restricted + from crossing region boundaries. + +As described above, whether a given variable is global or local is inferred by +the typechecker, although the `local_` keyword may be used to specify it. + +Additionally, local variables are further subdivided into two cases: + + - **Outer-region local**: may be a locally-allocated value, but only from an outer + region and not from the current one. + + - **Any-region local**: may be a locally-allocated value, even one allocated + during the current region. + +For instance: + + let f () = + let local_ outer = ref 42 in + let g () = + let local_ inner = ref 42 in + ?? + in + ... + +At the point marked `??` inside `g`, both `outer` and `inner` are +locally-allocated values. However, only `inner` is any-region local, having been +allocated in `g`'s region. The value `outer` is instead outer-region local: it +is locally allocated but from a region other than `g`'s own. + +So, if we replace `??` with `inner`, we see an error: + + Error: This local value escapes its region + +However, if we replace `??` with `outer`, the compiler will accept it: the +value `outer`, while locally allocated, was definitely not locally allocated +_during g_, and there is therefore no problem allowing it to escape `g`'s +region. + +(This is quite subtle, and there is an additional wrinkle: how does the +compiler know that it is safe to still refer to `outer` from within the closure +`g`? See "Closures" below for more details) + + +## Function types and local arguments + +Function types now accept the `local_` keyword in both argument and return +positions, leading to four distinct types of function: + + a -> b + local_ a -> b + a -> local_ b + local_ a -> local_ b + +In argument positions, `local_` indicates that the function may be passed +locally-allocated values. As always, the local_ keyword does not *require* +a locally-allocated value, and you may pass global values to such functions. In +effect, a function of type `local_ a -> b` is a function accepting `a` +and returning `b` that promises not to capture any reference to its argument. + +In return positions, `local_` indicates that the function may return +locally-allocated values. A function of type `local_ a -> local_ b` promises +not to capture any reference to its argument except possibly in its return +value. + +A function with a local argument can be defined by annotating the argument as +`local_`: + + let f (local_ x) = ... + +Inside the definition of `f`, the argument `x` is outer-region local: that is, +while it may be locally allocated, it is known not to have been allocated during +`f` itself, and thus may safely be returned from `f`. For example: + + # let f1 (local_ x : int list) = [1; 2; 3] + val f1 : local_ int list -> int list + + # let f2 (local_ x : int list) = x + val f2 : local_ int list -> local_ int list + + # let f3 (local_ x : int list) = (42 :: x) + ^ + Error: This value escapes its region + +In the above, `f1` returns a global `int list`, while `f2` returns a local one. +`f2` is allowed to return the local value `x` despite the ending of the +function's region, because the value `x` is known to come from outside that +region. + +In contrast, `f3` is an error. The value `42 :: x` must be locally allocated (as +it refers to a local value `x`), and it is locally allocated from within the +region of `f3`. When this region ends, the any-region local value `42 :: x` is +not allowed to escape it. + +It is possible to write functions like `f3` that return +locally-allocated values, but this requires explicit annotation, as it +would otherwise be easy to do by mistake. See "Local-returning +functions" below. + +Like local variables, inference can determine whether function arguments are +local. However, note that for arguments of exported functions to be local, the +`local_` keyword must appear in their declarations in the corresponding `.mli` +file. + + +## Closures + +Like most other values, closures can be locally allocated. In particular, this +happens when a closure closes over local values from an outer scope: since +global values cannot refer to local values, all such closures _must_ be locally +allocated. + +Consider again the example from "Variables and regions" above: + + let f () = + let local_ outer = ref 42 in + let g () = + let local_ inner = ref 42 in + outer + in + ... + +Here, since `g` refers to the local value `outer`, the closure `g` must itself +be locally allocated. (As always, this is deduced by inference, and an explicit +`local_` annotation on `g` is not needed). + +This then means that `g` is not allowed to escape its region, i.e. the body of +`f`. `f` may call `g` but may not return the closure. This guarantees that `g` +will only run before `f` has ended, which is what makes it safe to refer to +`outer` from within `g`. + +Higher-order functions should usually mark their function arguments as +`local_`, to allow local closures to be passed in. For instance, consider the +following function for computing the length of a list: + + let length xs = + let local_ count = ref 0 in + List.iter xs ~f:(fun () -> incr count); + !count + +With the standard type of `List.iter`, this results in a type error: + + List.iter xs ~f:(fun () -> incr count); + ^^^^^ + Error: The value count is local, so cannot be used inside a closure that might escape + +The standard type of `List.iter` is as follows: + + val iter : 'a list -> f:('a -> unit) -> unit + +This type places no restrictions on the use of `f`, allowing `iter` to capture +or otherwise leak its argument `f`. It is therefore not safe to pass a local +closure to such a function, hence the error. + +Instead, `List.iter` and similar functions should be given the following type: + + val iter : 'a list -> f:local_ ('a -> unit) -> unit + +This type carries the additional promise that `iter` does not capture its `f` +argument, allowing local closures to be passed. With this type, the above +`length` function is accepted. + +Note that the function `f` here _is_ allowed to capture its argument, +and there are no restrictions on what may be done with the list +elements themselves. To specify that `f` may _not_ capture its +argument, the type of iter would have to be: + + val iter : 'a list -> f:local_ (local_ 'a -> unit) -> unit + +The two occurrences of `local_` are independent: the first is a promise +by `iter` not to capture `f`, while the second is a requirement by +`iter` to be given an `f` that does not itself capture. + + + +## Tail calls + +Usually, a function's region lasts for the entire body of that function, +cleaning up local allocations at the very end. This story gets more complicated +if the function ends in a tail call, however, as such functions need to clean +up their stack frame before the tail call in order to ensure that +tail-recursive loops use only constant space. + +Therefore, when a function ends in a tail call, that function's region ends: + + - after the arguments to the tail call have been evaluated + + - but before control is transferred to the callee. + +This early ending of the region introduces some restrictions, as values used in +tail calls then count as escaping the region. In particular, any-region local values +may not be passed to tail calls: + + let f1 () = + let local_ r = ref 42 in + some_func r + ^ + Error: This local value escapes its region + Hint: This argument cannot be local, because this is a tail call + +and any-region local closures may not be tail-called: + + let f2 () = + let local_ g () = 42 in + g () + ^ + Error: This local value escapes its region + Hint: This function cannot be local, because this is a tail call + +In both cases, if tail recursion is not necessary, then the issue can be +resolved by moving the call so that it is not syntactically a tail call: + + let f1 () = + let local_ r = ref 42 in + let res = some_func r in + res + + let f2 () = + let local_ g () = 42 in + let res = g () in + res + +This change means that the locally allocated values (`r` and `g`) +will not be freed until after the call has returned. + +Note that values which are outer-region local rather than any-region local (that +is, local values that were passed into this region from outside) may be used in +tail calls, as the early closing of the region does not affect them: + + let f3 (local_ x) = + some_func x + +Here, even though the region of `f3` ends before the call to `some_func`, the +value `x` remains available. + + + +## Local-returning functions + +The region around the body of a function prevents local allocations inside that +function from escaping. Occasionally, it is useful to write a function that +allows local allocations to escape, which can be done by explicitly marking +such functions. + +This is useful particularly for constructor functions of abstract types. For +instance, consider this code that uses an `int ref` as a counter: + + let f () = + let counter = ref 0 in + ... + let n = !counter in + incr counter; + ... + +Here, inference will detect that `counter` does not escape and will allocate +the reference locally. However, this changes if we try to abstract out +`counter` to its own module: + + module Counter = struct + type t = int ref + + let make () = + ref 0 + + let next c = + let x = !c in + incr c; + x + end + + let f () = + let counter = Counter.make () in + ... + let n = Counter.next counter in + ... + +In this code, the counter will *not* be allocated locally. The reason is the +`Counter.make` function: the allocation of `ref 0` escapes the region of +`Counter.make`, and the compiler will therefore not allow it to be locally +allocated. This remains the case no matter how many local_ annotations we write +inside `f`: the issue is the definition of `make`, not its uses. + +To allow the counter to be locally allocated, we need to specify that +`Counter.make` may return local allocations. This can be done by wrapping the +entire body of `make` with the `local_` keyword: + + let make () = local_ + ref 0 + +The `local_` keyword around a function body like this specifies not only that +the allocation of the `ref` should be local, but more importantly that the +function `make` *should not have its own region*. + +Instead, local allocations during `make` are considered part of `f`s region, +and will only be cleaned up when that region ends. Local allocations are +allocated as always in the nearest enclosing region. However if the current +function is a local-returning function, then the nearest enclosing region will +be the caller's (or that of the caller's caller, etc., if the caller is also +local-returning). + + +## Records and mutability + +For any given variable, the typechecker checks only whether that variable is +local or global, and generally does not separately track parts of the variable. +For instance, the following code yields an error, even though `x` and `y` are +both global: + + let f () = + let local_ packed = (x, y) in + let x', y' = packed in + x' + +Here, the `packed` values is treated as local, and the typechecker then +conservatively assumes that `x'` and `y'` may also be local (since they are +extracted from `packed`), and so cannot safely be returned. + +Similarly, a variable `local_ x` of type `string list` means a local +list of local strings, and none of these strings can be safely +returned from a function like `f`. + +This can be overriden for record types, by annotating some fields with +`global_`: + + type ('a, 'b) t = { global_ foo : 'a; bar: 'b } + + let f () = + let local_ packed = {foo=x; bar=y} in + let {foo; bar} = packed in + foo + +Here, the `foo` field of any value of type `_ t` is always known to be global, +and so can be returned from a function. When constructing such a record, the +`foo` field must therefore be a global value, so trying to fill it with a local +value will result in an escape error, even if the record being constructed is +itself local. + +In particular, by defining: + + type 'a glob = { global_ contents: 'a } [@@unboxed] + +then a variable `local_ x` of type `string glob list` is a local list +of global strings, and while the list itself cannot be returned out of +a region, the `contents` field of any of its elements can. + +### Mutability + +Mutable fields are always `global_`, including array elements. That is, while +you may create local `ref`s or arrays, their contents must always be global. + +This restriction may be lifted somewhat in future: the tricky part is that +naively permitting mutability might allow an older local mutable value to be +mutated to point to a younger one, creating a dangling reference to an escaping +value when the younger one's region ends. + + +## Curried functions + +The function type constructor in OCaml is right-associative, so that these are +equal types: + + string -> string -> string + string -> (string -> string) + +These both describe a two-argument function which is curried, and therefore may +be partially applied to the first argument, yielding a closure that accepts the +second. + +The situation is more complicated when `local_` is involved. The following two +types are *not* equivalent: + + local_ string -> string -> string + local_ string -> (string -> string) + +The former is a two-argument function which accepts as its first argument +a local string. Like all two-argument functions, it may be partially applied to +a single argument yielding a closure that accepts the second. However, since +this closure closes over the first local argument, it must necessarily be local +itself. Thus, if applied to a single argument, this function in fact returns +a _local_ closure, making its type equal to the following: + + local_ string -> local_ (string -> string) + +By constrast, the type `local_ string -> (string -> string)` means a function +that accepts a local string but returns a global function. Necessarily, this +global function cannot refer to the local string that was passed, so this +cannot be an ordinary two-argument function. (It could be something like `fun +s -> print s; fun x -> x`, however) + +In general, in a curried function type `... -> ... -> ...` (without +parentheses), then after the first use of `local_`, all arrow types except the +last will implictly be given `local_` return types, enabling the expected +partial application behaviour. + +Finally, this transformation applies also to types marked with the `local_` +keyword. For instance, the following type: + + local_ (a -> b -> c -> d) -> e -> f -> g + +is read as: + + local_ (a -> local_ (b -> local_ (c -> d))) -> local_ (e -> local_ (f -> g)) + +Note the implicit `local_` both in the returned `e -> f` closure (as described +above), and also in the type of the `b -> c` argument. + + +### Currying of local closures + +Suppose we are inside the definition of a function, and there is in scope +a local value `counter` of type `int ref`. Then of the following two +seemingly-identical definitions, the first is accepted and the second is +rejected: + + + let local_ f : int -> int -> int = fun a b -> a + b + !counter in + ... + + let f : int -> int -> int = local_ fun a b -> a + b + !counter in + ... + +Both define a closure which accepts two integers and returns an integer. The +closure must be local, since it refers to the local value `counter`. In the +former definition, the type of the function appears under the `local_` keyword, +as as described above is interpreted as: + + int -> local_ (int -> int) + +This is the correct type for this function: if we partially apply it to +a single argument, the resulting closure will still be local, as it refers to +the original function which refers to `counter`. By contrast, in the latter +definition the type of the function is outside the `local_` keyword as is +interpreted as normal as: + + int -> (int -> int) + +This is not the correct type for this function: it states that partially +applying it to a single argument will yield a global closure, which is not the +case here. For this reason, this version is rejected. It would be accepted if +written as follows: + + let f : int -> local_ (int -> int) = local_ fun a b -> a + b + !counter in + ... + + +## Special case typing of tuple matching + +As mentioned above, the typechecker generally does not separately track +the local or global status of parts of a value, but rather tracks this +only once per variable or expression. There is one exception to this +rule, as follows. + +In OCaml, it is possible to simultaneously match on multiple values: + +``` + match x, y, z with + | p, q, r -> ... +``` + +There is in fact no special syntax for this: as parentheses are +optional in tuples, the above is actually a match on a single value, +the tuple `(x, y, z)`, against a single pattern, the pattern `(p, q, +r)`. + +Applying the usual rule that an expression is either treated as +entirely local or entirely global would mean that `p`, `q` and `r` +would all be local if any of `x`, `y` and `z` are. This is +counterintuitive, as the syntax above is usually thought of as a +multiple-value match, rather than a match on a single tuple value. For +this reason, the typechecker indendently tracks whether the parts of +this tuple are local or global. + +The same logic applies to simultaneous binding of multiple values: + +``` + let a, b, c = + ... + x, y, z +``` + +Again, there is no actual syntax for this in OCaml: that's a binding +of the single value `(x, y, z)` against the single pattern `(a, b, +c)`. Since it's usually thought of as the simultaneous binding of +several variables, the typechecker treats it as such rather than +making all of `a`,`b` and `c` local if any of `x`, `y` and `z` are. + + +## Primitive definitions + +Allocations in OCaml functions must either be local or global, as these are +compiled separately. A different option is available for `%`-primitives exported +by the stdlib, however, as these are guaranteed to be inlined at every use +site. Unlike ordinary functions, these primitives may be used to make both +local and global allocations, which is why `ref` worked for both local and +global in various examples above. + +In the interface for the stdlib (and as re-exported by Base), this feature is +enabled by use of the `[@local_opt]` annotation on `external` declarations.