Skip to content

Revised implementation of static allocation/initialization of globals #357

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 53 commits into from
Jan 11, 2020

Conversation

MatthewFluet
Copy link
Member

#328 introduced static allocation/initialization of globals, but some complexities and issues with the implementation were noted during review:

This revised implementation tries to simplify the complexities and address the issues:

  • The RSSA IR loses the Operand.Static {static: Var.t Static.t, ty: Type.t} variant and gains a statics: {dst: Var.t * Type.t, obj: Object.t} vector field in Program.T. The PackedRepresentation and Ssa2ToRssa passes are simplified, because the initial RSSA program is created with an empty statics field. The rssaShrink1 pass takes care of constant-folding and copy-propagating of object initialization. New collectStatics.{Globals,{WordXVector,Real}Consts} passes introduce objects into the statics field.

  • The Machine IR gains a staticsHeaps: StaticHeap.Kind -> StaticHeap.Object.t vector field in Program.T. Each "kind" of static heap is emitted to the main .c file as a statically initialized data definition that "looks" like an ML heap. There are four kinds of heaps:

    • Immutable: for immutable objects; such objects need never be traversed by the GC. (Note that global unit ref objects can be placed in the Immutable static heap, since they will never actually be mutated.)
    • Mutable: for objects with mutable non-objptr fields; such objects may be mutated, but need never be traversed by the GC. (Note that global empty mutable sequences can be placed in the Mutable static heap, since, even if they have mutable objptr fields, since the elements will never actually be mutated.)
    • Root: when the mutator does not mark cards, for objects with mutable objptr fields; such objects may be mutated and need to be traversed by the GC (because they may be updated to point to objects in the runtime heap). However, if card marking is used by the mutator, then the Root static heap cannot be used, because the write barrier with a base object in the Root static heap will attempt to write to an invalid card slot index. It would be possible to make the write barrier more expensive, by dynamically checking if the base is in the Root static heap.
    • Dynamic: when the mutator marks cards, for objects with mutable objptr fields, such objects may be mutated and need to be traversed by the GC. The Dynamic static heap is copied to the initial runtime heap at runtime initialization.

    In Backend, each RSSA static is placed in an appropriate "kind" of static heaps. For objects placed in the Dynamic static heap, they are accessed by the rest of the program via Global operands (and incur a level of indirection).

  • The Mutable and Root heaps are properly saved and loaded by MLton.World.

Other notable aspects of the PR:

  • The SSA2 IR gains an Exp.Sequence of {args: Var.t vector vector} variant to represent direct allocation of arrays and vectors, including initialization of elements. At toSsa2, the Vector_vector primitive is translated to SsaTree2.Exp.Sequence, rather than being translated to an Array_alloc Array_update Array_toVector sequence. At Ssa2ToRssa, a SsaTree2.Exp.Sequence is translated to an Rssa.Object.Sequence (via updates to PackedRepresentation). This allows global Vector_vector objects to be collected to statics.

  • A new Array_array primitive for literal arrays was introduced. The intention is that compilation might find opportunities to optimize explicit array allocation and initialization into the Array_array primitive.

Currently, there is not support for "empty" static objects. In the previous static allocation/initialization implementation, a global Array_alloc (necessarily with a constant length operand) would be translated to a special kind of static that would be placed in the BSS segment of the executable and dynamically initialized. A future PR could restore this functionality as follows:

  • Introduce MutableEmpty, RootEmpty, and DynamicEmpty static heap kinds that simply specify a heap size, along with mutableEmptyInit, rootEmptyInit, and dynamicEmptyInit data to properly initialize the headers.

  • Don't lower Array_alloc prims in Ssa2ToRssa. After rssaShrink1, it will be possible to read off the Array_alloc's with constant size. All such Array_allocs in the initGlobals function can be lifted to RSSA statics. Meanwhile, such Array_allocs in other functions can be more cheaply implemented via direct allocation by the mutator, rather than via the GC_sequenceAllocate runtime call (which induces a GC safe point).

However, "empty" static objects are only created with the (non-default) -globalize-arrays true, and so weren't exercised by default in the previous implementation.

Previously, the initialization of an object was accomplished by a
sequence of `Move` statements following the `Object` statement.  This
obscures the initialization and led to MLton#328 duplicating
logic in `functor PackedRepresentation` and `functor Ssa2ToRssa`; see
MLton#328 (comment).
Like `RssaTree.Statement.Object`, `RssaTree.Statement.Sequence`
corresponds to a direct allocation of a sequence, including
initialization of elements.
Like `SsaTree2.Exp.Object`, `SsaTree2.Exp.Sequence` corresponds to a
direct allocation of a sequence, including initialization of elements.

At `toSsa2`, the `Vector_vector` primitive is translated to
`SsaTree2.Exp.Sequence`, rather than being translated to an
`Array_alloc` `Array_update` `Array_toVector` sequence.
The `Array_array` primitive is like the `Vector_vector` primitive,
although the latter has source syntax `#[e1,...,en]` and the former
does not (yet).  The intention is that compilation might find
opportunities to optimize explicit array allocation and initialization
into the `Array_array` primitive.

Like the `Vector_vector` primitive, at `toSsa2`, the `Array_array`
primitive is translated to `SsaTree2.Exp.Sequence`.
The `StaticHeap.Kind.t` enumeration defines three types of static
heaps: `Immutable`, `Mutable`, and `Root`.  The `Immutable` static
heap will hold immutable objects (and will be marked `const` so as to
be placed in a read-only section).  The `Mutable` static heap will
hold objects with mutable non-objptr fields; such objects need not be
traced by the garbage collector (since they can only refer to other
static heap objects).  The `Root` static heap will hold objects with
mutable objptr fields; such objects need to be traced by the garbage
collector (since they can be updated to refer to dynamic heap
objects).

The `StaticHeap.Ref.t` type corresponds to a reference to an object in
a static heap, represented by a byte offset.
…aps`

The `Machine.StaticHeap.Object.t` datatype represents a static
object (either `Normal` or `Sequence`), including initialization.

The `Machine.Program.T#staticHeaps` is a function of type
`StaticHeap.Kind.t -> StaticHeap.Object.vector`.
Extract the `Object` and `Sequence` variants of `RssaTree.Statement.t`
to an `RssaTree.Object.t` datatype.  Often times, both variants are
treated similarly, so this reduces some code duplication.  More
importantly, only `RssaTree.Object.t` variants are suitable for static
allocation/initialization and a subsequent commit will introduce a
`RssaTree.Program.T#statics: RssaTree.Object.t vector` field.
`RssaTree.Program.T#statics: RssaTree.Object.t vector` will collect
objects suitable for static allocation/initialization.
Quells gcc/clang warnings.
A cast is required when a tagged word is used as an objptr.
This is consistent with the type of `GC_sequenceLength`.
The `CollectStatics.WordXVectorConsts` pass replaces `WordXVector`
constants with `Var` operands to equivalent static `Sequence` objects.

The `CollectStatics.RealConsts` pass accumulates `Real` constants into
static `Sequence` objects and replaces them with an appropriate
`SequenceOffset` operands.  This is an alternative to the lifting of
`Real` constants to `Global` operands in `Backend`.  (`Real` constants
are not propagated through the Machine IR program because the native
codegens do not support real literals (x86 and amd64 do not have
convenient floating-point literal instructions; floating-point values
must be loaded from memory) and the C and LLVM codegens incorrectly
constant-fold floating-point operations when the rounding mode may
change.)

The `CollectStatics.Globals` pass lifts `Object` and `Sequence`
objects in the `main` (`initGlobals`) function to statics.
This control determines whether or not RSSA `Static` operands are
introduced at `toRssa` by `translateGlobalStatics`.
Using an empty string constant for initialization triggers an
"internal compiler error: in output_constructor_regular_field, at
varasm.c" in gcc-7 and gcc-8.
A static object that is updated with an `Objptr` would trigger a card
marking, but the address of a static object would not map to a valid
card slot.
A `Static` operand is not an lvalue.
Rather than using a collection of `bool ref`s implemented as mutable
static objects, use a `bool array` implemented as an object in the
mutable static heap.
When the mutator marks cards, `staticHeapR` cannot be used for objects
with mutable objptr fields, because the address of an object in
`staticHeapR` won't map to a valid card slot for the write barrier.
Such global objects with mutable objptr fields must be placed in the
dynamic heap and referenced indirectly via a `Global` operand.
However, it is still possible to collect all such objects into a
static heap, which is copied to the initial dynamic heap, rather than
initializing them via the `initGlobals` function.

See MLton#328 (comment).
This shares code for object representation between the RSSA and
Machine IRs.
A "packed" `struct` may still have padding at the end, which is
included in C's `sizeof` calculation.  However, for `memcpy`-ing and
`foreachObjptrInRange`-ing a static heap, we need a byte accurate
size.

We include a sentinel `struct {} end;` field in the static heap
structs to explicitly compute the size of a static heap as
`(pointer)&staticHeap.end - (pointer)&staticHeap`.
We check the following allowed references between heap objects:

I(immutable) -> {I,M,R}
M(mutable)   -> {I,M,R}
R(root)      -> {I,M,R,H}
H(runtime)   -> {I,M,R,H}
@MatthewFluet MatthewFluet merged commit 7e08585 into MLton:master Jan 11, 2020
@MatthewFluet MatthewFluet deleted the static-heaps branch January 11, 2020 10:36
MatthewFluet added a commit to MatthewFluet/mpl that referenced this pull request Sep 24, 2020
MLton/mlton#357 (revising MLton/mlton#328) introduced a number of "static heaps"
into the compilation.  Essentially, many "global" objects can be fully evaluated
at compile time and represented in the compiled program as statics.  The
"static" objects look like regular ML objects (with proper headers, etc.), but
exist outside the MLton heap.

This is a "path of least resistance" commit for MaPLe.  The
`collectStatics.Globals` and `collectStatics.RealConsts` passes are disabled,
but the `collectStatics.WordXVectorConsts` pass is enabled.  In the `backend`
pass (translation of RSSA to Machine), all RSSA statics are forced to the
`Dynamic` static heap (and assigned a corresponding global objptr slot);
similarly, any remaining `WordXVector` constants are forced to the `Dynamic`
static heap.  At program startup, the `Dynamic` static heap is copied into the
root hierarchical heap.  (This is slightly more complicated than the copy of the
`Dynamic` static heap into the initial heap in MLton, because in MaPLe the
`Dynamic` static heap may need to be split across multiple chunks.)

See MPLLang#127 for more discussion.
MatthewFluet added a commit to MatthewFluet/mpl that referenced this pull request Sep 25, 2020
MLton/mlton#357 (revising MLton/mlton#328) introduced a number of "static heaps"
into the compilation.  Essentially, many "global" objects can be fully evaluated
at compile time and represented in the compiled program as statics.  The
"static" objects look like regular ML objects (with proper headers, etc.), but
exist outside the MLton heap.

This is a "path of least resistance" commit for MaPLe.  The
`collectStatics.Globals` and `collectStatics.RealConsts` passes are disabled,
but the `collectStatics.WordXVectorConsts` pass is enabled.  In the `backend`
pass (translation of RSSA to Machine), all RSSA statics are forced to the
`Dynamic` static heap (and assigned a corresponding global objptr slot);
similarly, any remaining `WordXVector` constants are forced to the `Dynamic`
static heap.  At program startup, the `Dynamic` static heap is copied into the
root hierarchical heap.  (This is slightly more complicated than the copy of the
`Dynamic` static heap into the initial heap in MLton, because in MaPLe the
`Dynamic` static heap may need to be split across multiple chunks.)

See MPLLang#127 for more discussion.
MatthewFluet added a commit to MatthewFluet/mpl that referenced this pull request Sep 25, 2020
MLton/mlton#357 (revising MLton/mlton#328) introduced a number of "static heaps"
into the compilation.  Essentially, many "global" objects can be fully evaluated
at compile time and represented in the compiled program as statics.  The
"static" objects look like regular ML objects (with proper headers, etc.), but
exist outside the MLton heap.

This is a "path of least resistance" commit for MaPLe.  The
`collectStatics.Globals` and `collectStatics.RealConsts` passes are disabled,
but the `collectStatics.WordXVectorConsts` pass is enabled.  In the `backend`
pass (translation of RSSA to Machine), all RSSA statics are forced to the
`Dynamic` static heap (and assigned a corresponding global objptr slot);
similarly, any remaining `WordXVector` constants are forced to the `Dynamic`
static heap.  At program startup, the `Dynamic` static heap is copied into the
root hierarchical heap.  (This is slightly more complicated than the copy of the
`Dynamic` static heap into the initial heap in MLton, because in MaPLe the
`Dynamic` static heap may need to be split across multiple chunks.)

See MPLLang#127 for more discussion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant