Skip to content

Commit

Permalink
flambda-backend: Add documentation for mixed blocks (#2667)
Browse files Browse the repository at this point in the history
* Add mixed block docs

* Reword

* Update ocaml/jane/doc/extensions/unboxed-types/index.md

Co-authored-by: Xavier Clerc <xclerc@users.noreply.github.com>

* Suggestions from review

* Fix from @TheNumbat's review

---------

Co-authored-by: Xavier Clerc <xclerc@users.noreply.github.com>
  • Loading branch information
ncik-roberts and xclerc authored Jun 7, 2024
1 parent 91f1c2c commit 0b20098
Showing 1 changed file with 145 additions and 1 deletion.
146 changes: 145 additions & 1 deletion jane/doc/extensions/unboxed-types/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -284,4 +284,148 @@ Here's the list of primitives that currently support `[@layout_poly]`:
* `%array_safe_get`
* `%array_safe_set`
* `%array_unsafe_get`
* `%array_unsafe_set`
* `%array_unsafe_set`

# Using unboxed types in structures

Unboxed types can usually be put in structures, though there are some restrictions.

These structures may contain unboxed types, but have some restrictions on field
orders:
* Records
* Constructors

Unboxed numbers can't be put in these structures:
* Constructors with inline record fields
* Exceptions
* Extensible variant constructors
* Top-level fields of modules
* Tuples

There aren't fundamental issues with the structures that lack support. They will
just take some work to implement.

Here's an example of a record with an unboxed field. We call such a record
a "mixed record".

```ocaml
type t =
{ str : string;
i : int;
f : float#;
}
```

## Restrictions on field ordering

The below is written about record fields but equally applies to constructor
arguments.

Suppose a record contains any unboxed field `fld` whose layout is not `value`[^or-combination-of-values]. Then, the following restriction applies: All
fields occurring after `fld` in the record must be "flat", i.e. the GC can
skip looking at them. The only options for flat fields are immediates (i.e. things
represented as ints at runtime) and other unboxed numbers.

[^or-combination-of-values]: Technically, there are some non-value layouts that don't hit this restriction, like unboxed products and unboxed sums consisting only of values.

The following definition is rejected, as the boxed field `s : string` appears
after the unboxed float field `f`:

```ocaml
type t_rejected =
{ f : float#;
s : string;
}
(* Error: Expected all flat fields after non-value field, f,
but found boxed field, s. *)
```

The only relaxation of the above restriction is for records that consist
solely of `float` and `float#` fields. Any ordering of `float` and `float#`
fields is permitted. The "flat float record optimization" applies to any
such record&mdash;all of the fields are stored flat, even the `float` ones
that will require boxing upon projection. The ordering restriction is relaxed
in this case to provide a better migration story for all-`float` records
to which the flat float record optimization currently applies.

```ocaml
type t_flat_float =
{ x1 : float;
x2 : float#;
x3 : float;
}
```

The ordering restriction has to do with the "mixed block" runtime
representation. Read on for more detail about that.

## Generic operations aren't supported

Some operations built in to the OCaml runtime aren't supported for structures
containing unboxed types.

These operations aren't supported:
* polymorphic comparison and equality
* polymorphic hash
* marshaling

These operations raise an exception at runtime, similar to how polymorphic
comparison raises when called on a function.

You should use ppx-derived versions of these operations instead.

## Runtime representation: mixed blocks

As a general principle: The compiler should not change the user-specified
field ordering when deciding the runtime representation.

Abiding by this principle allows you to write C bindings and
predict hardware cache performance.

A structure containing unboxed types is represented at runtime as a "mixed
block". A mixed block always consists of fields the GC can-or-must scan followed by
fields the GC can-or-must skip[^can-or-must]. The garbage collector must be kept
informed of which fields of the block it should scan. A portion of the header
word is reserved to track the length of the prefix of the block that should be
scanned by the garbage collector.

[^can-or-must]: "Can-or-must" is a bit of a mouthful, but it captures the right nuance. Pointer values *must* be scanned, unboxed number fields *must* be skipped, and immediate values *can* be scanned or skipped.

The ordering constraint on structure fields is a reflection of the same
ordering restriction in the runtime representation.

## C bindings for mixed blocks

The implementation of field layout in a mixed block is not finalized. For example, we'd like for int32 fields to be packed efficiently (two to a word) on 64 bit platforms. Currently that's not the case: each one takes up a word.

Users who write C bindings might want to be notified when we change this layout. To ensure that your code will need to be updated when the layout changes, use the `Assert_mixed_block_layout_v#` family of macros. For example,

```
Assert_mixed_block_layout_v1;
```

Write the above in statement context, i.e. either at the top-level of a file or
within a function.

Here's a full example. Say you're writing C bindings against this OCaml type:

```ocaml
(** foo.ml *)
type t =
{ x : int32#;
y : int32#;
}
```

Here is the recommend way to access fields:

```c
Assert_mixed_block_layout_v1;
#define Foo_t_x(foo) (*(int32_t*)&Field(foo, 0))
#define Foo_t_y(foo) (*(int32_t*)&Field(foo, 1))
```
We would bump the version number in either of these cases, which would prompt you to think about the code:
* We change what word half the int32 is stored in
* We start packing int32s more efficiently

0 comments on commit 0b20098

Please sign in to comment.