Skip to content

Add static allocation support in Rssa through codegen #328

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 101 commits into from
Sep 20, 2019

Conversation

jasoncarr0
Copy link
Contributor

@jasoncarr0 jasoncarr0 commented Aug 12, 2019

The later IRs now support static data of various sorts. Some features are merged into statics: isMutable in Rssa has been weakened to pinned which is only needed to maintain the changes from BounceVars without excessive liveness analysis. The vectors field of Machine.Program.T has been changed to include general statics. In Rssa, a new expression defines a unique static for each usage in the program. In the backend, statics are moved into the program (no uniqueifying is performed, as it seems to be an artifact of propagation of constants), and references are done through an operand. The operand Static is a constant value which has type CPointer or ObjPtr, and its data can be accessed with Offset/Contents as normal. Thus, it has no address, and it cannot be written to (which indicated some conflations of meaning for isLocation).

Under the default behavior, globals are not statically allocated if they have one or more mutable fields which can hold an object pointer (after packing) or if their data is not statically allocated. In practice, there are no such globals for just about all programs.

This version has support in all codegens. As expected, code size and compile time are improved across the board, particularly for larger programs. Runtime improves sometimes, but should only affect programs which had hot code accessing globals, as it removes one level of indirection. We should also expect marginally faster garbage collections, as globals are now mostly skipped.

A couple pieces are still outstanding for discussion and review:

  • Command line options. There's a lot of dimensions to this, and in general they should all be enabled unless a bug or performance regression arises in user code, or a GC choice makes them impossible. Right now disabling it for particular sorts of data still leaves them statically initialized (and heap allocated), but ...
  • Objects which are not statically allocated are currently created in code. It seems not to be worth it to make a statically initialized heap allocated object, as the code to initialize them is quite small. The statically allocated objects should have better compile time though.
  • There's still a good bit of complexity that I'd like to reduce as much as possible of. Some options are partially coupled: generally the heap allocated statics are precisely the statics with destination globals. As an awkwardness with this, which de-facto cannot occur (due to propagation), is a non-heap static with global, which would be copied to the heap under the current code (but doing something else would be an additional edge case). Empty (array) data has to have different initialization rules from others to avoid executable bloat (so it goes in bss, instead of text). I have at least managed to center all decision-making to ssa2-to-rssa and all initialization to c codegen's outputDecls.
  • Rssa still has WordXVector support for constants. Only one particular pass in Rssa created them, so it may be easiest to just disable it, which would simplify the backend and reduce redundancy. The complexity here is that disabling this requires duplicating or factoring the logic for deciding where to place statics (WordXVectors are only placed on the heap at the moment).
  • Statics are somewhat inconsistent with globals: they are stored with the program, and information about them is local, whereas globals are created and stored globally, but I couldn't convince myself to create a module just for a single integer counter.

This resolves #300

This affects the previous globalization changes in #288, so they should be re-examined as well. In particular effects due to extra indirections should disappear, so remaining effects will only be due to low-level perturbations and interactions with other passes (local-ref for instance).

jasoncarr0 added 30 commits July 3, 2019 13:04
This will be inconsistent with objects since VectorInitElem
has not been changed so header support is not yet present
The macro indirection is a bit out of place next to the
other initializations via vectors
No support yet in codegens. No support for accessing via a global
except for heap objects (which are copied) because that requires
extra support in c decls to assign the global

Used M as the signifier because S, T, C, and X are taken;
it could be taken to mean Memory or syMbol
Backend doesn't generate Binds for destinations that were turned into
constants. As it happens, idempotency for existing constants turned
any such binds to self-moves, which although were ill-typed
(e.g. it could in theory include a move from the constant 3 into itself),
were eliminated to Noops by the move function

isLocation conflated two meanings: the ability to point to something,
and the ability to be a destination for moves.
Statics are constant pointers, and such are only in the latter class.
The former is not needed, and can be recovered from the type
…zed words in Static.Data

This will require codegen changes which are not present, as the
non-pointer-sized words require a different structure layout
(packed with several fields)

ssa2-to-rssa does not take advantage of these changes in this patch
With the previous changes to statics, this isn't quite
supported in any codegen yet as it requires the more complicated
struct types.
New controls are:

 * `-static-alloc-wordvector-consts {true|false}`

   Controls whether or not `WordXVector` constants are converted to
   statics (with `ImmStatic` location) at `Ssa2ToRssa`.

 * `-static-init-arrays {true|false}`

   Controls whether or not `Array_alloc` primitives are converted
   to statics (with `MutStatic` or `Heap` location) at `Ssa2ToRssa`.

 * `-static-init-objects {none|staticAllocOnly|all}`

   Controls whether or not `Object` expressions are converted to
   statics at `Ssa2ToRssa`.  If `staticAllocOnly`, then an object that
   would be converted to a static with `Heap` location is not
   converted to a static.
And eliminate `Static.Data.size: 'a Static.Data.t -> WordSize.t * int`.
New controls are:

 * `-static-alloc-arrays {true|false}`

   Controls whether or not `Array_alloc` primitives that can be
   statically initialized are forced to `Heap` location.

 * `-static-alloc-objects {true|false}`

   Controls whether or not `Object` expressions that can be statically
   initialized are forced to `Heap` location.
With b3873f1, a program can have an empty `objectInits[]` to be
allocated in the initial dynamic heap.  With an empty `objectInits[]`,
`sizeofInitialBytesLive` would return 0.  Attempting to create a heap
of zero bytes would fail (the specified SUSv3 behavior of `mmap`),
triggering a backoff computed as:

    highSize = newSize - s->sysvals.pageSize;
    newSize = align((factor-1) * (highSize / factor) + (lowSize / factor), s->sysvals.pageSize);

However, `newSize - s->sysval.pageSize` (with `newSize` equals 0),
wraps to nearly 2^64.  Successive backoffs eventually bring the
requested size down to one that can be satisfied, but the size is
generally much larger than required.  Moreover, the heap won't be
resized until a subsequent GC, which may not occur during the run of
the program, due to the heap being so large.  Many regression tests
run, but most that fork fail with `unhandled exception: SysErr: Cannot
allocate memory [nomem]`, because both the parent has allocated a very
large heap and the child is requesting to allocate a very large heap.

Now, `sizeofInitialBytesLive` also includes the size of the initial
thread/stack to be created, ensuring that it is always non-zero.
Objects without a representation (e.g., `unit`) may participate in the
construction of other objects (e.g., `unit ref`).

    (* Globals: *)
    val x_0: unit = obj ()
    ...
    val global_441: (unit mut) tuple = obj (x_0 (*obj ()*))

However, objects without a representation were not registered as being
static, thus preventing subsequent objects constructed with them from
being made static.
val location = getLocation (ty, !Control.staticAllocObjects, false)
in
if location <> Static.Location.Heap
orelse staticInitHeapObjects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was struggling to find/write a source program that, with -static-alloc-internal-ptrs static -static-alloc-objects true -static-init-objects all, would trigger this code path that leads to a statically initialized but dynamically allocated object. Eventually, I realized that there should be no such objects. With -static-alloc-internal-ptrs static -static-alloc-objects true, the only objects that will be statics with a Heap location are objects with a mutable Objptr field. Moreover, only global objects are eligible to be statics. But, assuming a safe-for-space globalization pass, there should be no global objects with mutable Objptr fields (because global objects are roots for the whole program execution). I seem to remember @jasoncarr0 making some comment that was supposed to suggest this (something along the lines of "by fiat, should never happen"), but I can't find it right now.

Using -globalize-small-types 9, which allows the globalization of arbitrary array and ref objects (irrespective of the "size" of their contents), then we can trigger global objects with mutable Objptr fields that become statically initialized, dynamically allocated.

With this observation about safe-for-space, I'm not understanding why there should ever be a non-empty globalObjptr[] array. In a self-compile, there are 26.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can happen, if globalize-small-type 4 is on, then an example would be a mutable field of a tuple of total size more than 64 bytes. It is safe to globalize, but will turn into an Objptr in Rssa. With the current code, intInfs aren't included only because they're infrequent, so that would explain some globalObjptrs on a self-compile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I've made a mistake in understanding packed-representation here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, that a (int * int) ref could be space-safely globalized and could result in an object with a mutable Objptr field. However, it is also likely that such a tuple would be RefFlattened; I suspect that is why we don't see many of them. As you say with -globalize-small-types 4, then we might have a (int * int, int * int) either ref be globalized.

Also, it turns out that we "grandfathered" IntInf.int as a "small type"; so a IntInf.int ref can be globalized, but is represented as an object with a mutable Objptr field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, we might have an IntInf.int ref ref; the inner ref might be statically initialized (but dynamically allocated), so the outer ref could not be statically initialized.

@MatthewFluet
Copy link
Member

The LocalRef optimization pass has a pre-transformation that moves any global 'a ref objects that are used in exactly one function, into the using function, so that it has an opportunity to be localized. However, if the ref is not turned into a local, then it is not moved back to a global.

@MatthewFluet
Copy link
Member

With this observation about safe-for-space, I'm not understanding why there should ever be a non-empty globalObjptr[] array. In a self-compile, there are 26.

One contribution is that IntInf constants are not made into statics.

@MatthewFluet
Copy link
Member

MatthewFluet commented Sep 20, 2019

Draft merge commit message

Static allocation/initialization of objects in backend

The main benefits are that code size and compile time are improved across the board, particularly for larger programs. Runtime sometimes improves, but should only affect programs which had hot code accessing globals, as it removes one level of indirection. Garbage collections might be marginally faster, as globals are now mostly skipped.

Statically allocated and initialized objects are created in the main .c file, where they will be placed in the data segment of the executable:

const struct {Word64 meta_0; Word64 meta_1; Word64 meta_2; Word8 data[9];}
static_20 = {(Word64)(0x0ull), (Word64)(0x9ull), (Word64)(0x7ull), "addrinuse"};
const struct {Word64 meta_0; Word32 data_0; Word32 data_1; Pointer data_2; }
static_21 = {(Word64)(0x29ull), (Word32)(0x62ull), (Word32)(0x0ull), ((Pointer)(&static_20) + 24)};

Note that these are proper ML objects, with metadata and data. References to statically allocated objects are via pointers to the first data field (e.g., &static20 + 24). Note also that WordXVector (e.g., strings) are a special case of statically allocated and initialized objects. Statically allocated and initialized objects can be both immutable and mutable, although the latter should be restricted to objects with non-Objptr mutable fields.

A special case of statically allocated objects are arrays, whose contents will be dynamically initialized by the mutator. These are also created in the main .c file, but are placed in the bss segment of the executable (decreasing the size of the executable) and proper metadata is written by initialization code:

struct {Word64 meta_0; Word64 meta_1; Word64 meta_2; Word8 data[800000];}
static_26;
struct {Word64 meta_0; Word64 meta_1; Word64 meta_2; Word8 data[0];}
static_31;

static void static_Init() {
    memcpy (&static_26, &((struct {Word64 meta_0; Word64 meta_1; Word64 meta_2}){(Word64)(0x0ull), (Word64)(0x186A0ull), (Word64)(0x11ull)}), 24);
    memcpy (&static_31, &((struct {Word64 meta_0; Word64 meta_1; Word64 meta_2}){(Word64)(0x0ull), (Word64)(0x0ull), (Word64)(0x13ull)}), 24);
};

Finally, dynamically allocated but statically initialized objects have their initialization data in the main .c file along with information to copy that data to the initial dynamic heap during initWorld:

const static struct {Word64 meta_0; Word64 data_0; }
static_9819 = {(Word64)(0x79Dull), (Word64)(0x1ull)};
const static struct {Word64 meta_0; Word64 data_0; }
static_9820 = {(Word64)(0x79Dull), (Word64)(0x1ull)};

static struct GC_objectInit objectInits[] = {
    { 11, 8, 16, ((Pointer) &static_9819) },
    { 12, 8, 16, ((Pointer) &static_9820) },
    ...
}

By default (with -static-init-objects staticAllocOnly), no such objects are created.

With -static-init-objects all, global objects with Objptr mutable fields would be dynamically allocated but statically initialized. But, such global objects are rare. For example, a (int * int) ref could be space-safely globalized and would be an object with a mutable Objptr field. However, it is also likely that such a tuple would be RefFlattened. With -globalize-small-type 4 (see #288 and 752467c), an (int * int, int * int) either ref could be globalized and represented as an object with a mutable Objptr field. Similarly, an IntInf.int ref can also be globalized and would be represented as an object with a mutable Objptr field.

With -static-alloc-objects false -static-init-objects all, all global objects will be dynamically allocated but statically initialized (and no global objects will be statically allocated). Similarly, with -static-alloc-wordvector-consts false, string constants will be dynamically allocated but statically initialized; this corresponds to the previous MLton behavior with respect to string constants.

A number of controls have been added to control static allocation/initialization:

  • -static-alloc-internal-ptrs {static|all|none}

    Controls which kinds of objects can be statically allocated:

    • static: only objects with all fields either immutable or non-Objptr
    • none: only objects with no fields
    • all: all objects

    The all setting is incompatible with the current GC for two reasons. First, statically allocated objects are not traced by the GC; a statically-allocated object that is updated with an Objptr to an object in the heap should be considered a root. Second, a statically-allocated object that is updated with an Objptr would trigger a card marking, but the address of a statically-allocated object would not map to a valid card slot.

  • -static-alloc-wordvector-consts {true|false}

    Controls whether or not WordXVector constants are converted to statics (with ImmStatic location) at Ssa2ToRssa.

  • -static-init-arrays {true|false}

    Controls whether or not Array_alloc primitives are converted to statics (with MutStatic or Heap location) at Ssa2ToRssa.

  • -static-alloc-arrays {true|false}

    Controls whether or not Array_alloc primitives that can be statically initialized are forced to Heap location.

  • -static-init-objects {none|staticAllocOnly|all}

    Controls whether or not Object expressions are converted to statics at Ssa2ToRssa. If staticAllocOnly, then an object that would be converted to a static with Heap location is not converted to a static.

  • -static-alloc-objects {true|false}

    Controls whether or not Object expressions that can be statically initialized are forced to Heap location.

@MatthewFluet MatthewFluet merged commit 57c9c94 into MLton:master Sep 20, 2019
MatthewFluet added a commit to MatthewFluet/mlton that referenced this pull request Jan 10, 2020
Previously, the initialization of an object was accomplished by a
sequence of `Move` statements following the `Object` statement.  This
obscures the initialization and led to MLton#328 duplicating
logic in `functor PackedRepresentation` and `functor Ssa2ToRssa`; see
MLton#328 (comment).
MatthewFluet added a commit to MatthewFluet/mlton that referenced this pull request Jan 10, 2020
MatthewFluet added a commit to MatthewFluet/mlton that referenced this pull request Jan 10, 2020
MatthewFluet added a commit to MatthewFluet/mlton that referenced this pull request Jan 10, 2020
When the mutator marks cards, `staticHeapR` cannot be used for objects
with mutable objptr fields, because the address of an object in
`staticHeapR` won't map to a valid card slot for the write barrier.
Such global objects with mutable objptr fields must be placed in the
dynamic heap and referenced indirectly via a `Global` operand.
However, it is still possible to collect all such objects into a
static heap, which is copied to the initial dynamic heap, rather than
initializing them via the `initGlobals` function.

See MLton#328 (comment).
MatthewFluet added a commit that referenced this pull request Jan 11, 2020
Revised implementation of static allocation/initialization of globals

Heavily inspired and based on a previous implementation by Jason Carr;
see #328.

#328 introduced static allocation/initialization of globals, but some
complexities and issues with the implementation were noted during review:

 * #328 (comment)
 * #328 (comment)
 * #328 (comment)

This revised implementation tries to simplify the complexities and address the
issues:

 * The RSSA IR loses the `Operand.Static {static: Var.t Static.t, ty: Type.t}`
   variant and gains a `statics: {dst: Var.t * Type.t, obj: Object.t} vector`
   field in `Program.T`.  The `PackedRepresentation` and `Ssa2ToRssa` passes are
   simplified, because the initial RSSA program is created with an empty
   `statics` field.  The `rssaShrink1` pass takes care of constant-folding and
   copy-propagating of object initialization.  New
   `collectStatics.{Globals,{WordXVector,Real}Consts}` passes introduce objects
   into the `statics` field.

 * The Machine IR gains a `staticsHeaps: StaticHeap.Kind -> StaticHeap.Object.t
   vector` field in `Program.T`.  Each "kind" of static heap is emitted to the
   main `.c` file as a statically initialized data definition that "looks" like
   an ML heap.  There are four kinds of heaps:

    * `Immutable`: for immutable objects; such objects need never be traversed
      by the GC.  (Note that global `unit ref` objects can be placed in the
      `Immutable` static heap, since they will never actually be mutated.)
    * `Mutable`: for objects with mutable non-objptr fields; such objects may be
      mutated, but need never be traversed by the GC.  (Note that global empty
      mutable sequences can be placed in the `Mutable` static heap, since, even
      if they have mutable objptr fields, since the elements will never actually
      be mutated.)
    * `Root`: when the mutator does not mark cards, for objects with mutable
      objptr fields; such objects may be mutated and need to be traversed by the
      GC (because they may be updated to point to objects in the runtime heap).
      However, if card marking is used by the mutator, then the `Root` static
      heap cannot be used, because the write barrier with a base object in the
      `Root` static heap will attempt to write to an invalid card slot index.
      It would be possible to make the write barrier more expensive, by
      dynamically checking if the base is in the `Root` static heap.
    * `Dynamic`: when the mutator marks cards, for objects with mutable objptr
      fields, such objects may be mutated and need to be traversed by the GC.
      The `Dynamic` static heap is copied to the initial runtime heap at runtime
      initialization.

   In `Backend`, each RSSA `static` is placed in an appropriate "kind" of static
   heaps.  For objects placed in the `Dynamic` static heap, they are accessed by
   the rest of the program via `Global` operands (and incur a level of
   indirection).

 * The `Mutable` and `Root` heaps are properly saved and loaded by
   `MLton.World`.

Other notable aspects of the PR:

 * The SSA2 IR gains an `Exp.Sequence of {args: Var.t vector vector}` variant to
   represent direct allocation of arrays and vectors, including initialization
   of elements.  At `toSsa2`, the `Vector_vector` primitive is translated to
   `SsaTree2.Exp.Sequence`, rather than being translated to an `Array_alloc`
   `Array_update` `Array_toVector` sequence.  At `Ssa2ToRssa`, a
   `SsaTree2.Exp.Sequence` is translated to an `Rssa.Object.Sequence` (via
   updates to `PackedRepresentation`).  This allows global `Vector_vector`
   objects to be collected to statics.

  * A new `Array_array` primitive for literal arrays was introduced.  The
    intention is that compilation might find opportunities to optimize explicit
    array allocation and initialization into the `Array_array` primitive.

Currently, there is not support for "empty" static objects.  In the previous
static allocation/initialization implementation, a global `Array_alloc`
(necessarily with a constant length operand) would be translated to a special
kind of static that would be placed in the BSS segment of the executable and
dynamically initialized.  A future PR could restore this functionality as
follows:

 * Introduce `MutableEmpty`, `RootEmpty`, and `DynamicEmpty` static heap kinds
   that simply specify a heap size, along with `mutableEmptyInit`,
   `rootEmptyInit`, and `dynamicEmptyInit` data to properly initialize the
   headers.

  * Don't lower `Array_alloc` prims in `Ssa2ToRssa`.  After `rssaShrink1`, it
    will be possible to read off the `Array_alloc`'s with constant size.  All
    such `Array_alloc`s in the `initGlobals` function can be lifted to RSSA
    `statics`.  Meanwhile, such `Array_alloc`s in other functions can be more
    cheaply implemented via direct allocation by the mutator, rather than via
    the `GC_sequenceAllocate` runtime call (which induces a GC safe point).

However, "empty" static objects are only created with the (non-default)
`-globalize-arrays true`, and so weren't exercised by default in the previous
implementation.
MatthewFluet added a commit to MatthewFluet/mpl that referenced this pull request Sep 24, 2020
MLton/mlton#357 (revising MLton/mlton#328) introduced a number of "static heaps"
into the compilation.  Essentially, many "global" objects can be fully evaluated
at compile time and represented in the compiled program as statics.  The
"static" objects look like regular ML objects (with proper headers, etc.), but
exist outside the MLton heap.

This is a "path of least resistance" commit for MaPLe.  The
`collectStatics.Globals` and `collectStatics.RealConsts` passes are disabled,
but the `collectStatics.WordXVectorConsts` pass is enabled.  In the `backend`
pass (translation of RSSA to Machine), all RSSA statics are forced to the
`Dynamic` static heap (and assigned a corresponding global objptr slot);
similarly, any remaining `WordXVector` constants are forced to the `Dynamic`
static heap.  At program startup, the `Dynamic` static heap is copied into the
root hierarchical heap.  (This is slightly more complicated than the copy of the
`Dynamic` static heap into the initial heap in MLton, because in MaPLe the
`Dynamic` static heap may need to be split across multiple chunks.)

See MPLLang#127 for more discussion.
MatthewFluet added a commit to MatthewFluet/mpl that referenced this pull request Sep 25, 2020
MLton/mlton#357 (revising MLton/mlton#328) introduced a number of "static heaps"
into the compilation.  Essentially, many "global" objects can be fully evaluated
at compile time and represented in the compiled program as statics.  The
"static" objects look like regular ML objects (with proper headers, etc.), but
exist outside the MLton heap.

This is a "path of least resistance" commit for MaPLe.  The
`collectStatics.Globals` and `collectStatics.RealConsts` passes are disabled,
but the `collectStatics.WordXVectorConsts` pass is enabled.  In the `backend`
pass (translation of RSSA to Machine), all RSSA statics are forced to the
`Dynamic` static heap (and assigned a corresponding global objptr slot);
similarly, any remaining `WordXVector` constants are forced to the `Dynamic`
static heap.  At program startup, the `Dynamic` static heap is copied into the
root hierarchical heap.  (This is slightly more complicated than the copy of the
`Dynamic` static heap into the initial heap in MLton, because in MaPLe the
`Dynamic` static heap may need to be split across multiple chunks.)

See MPLLang#127 for more discussion.
MatthewFluet added a commit to MatthewFluet/mpl that referenced this pull request Sep 25, 2020
MLton/mlton#357 (revising MLton/mlton#328) introduced a number of "static heaps"
into the compilation.  Essentially, many "global" objects can be fully evaluated
at compile time and represented in the compiled program as statics.  The
"static" objects look like regular ML objects (with proper headers, etc.), but
exist outside the MLton heap.

This is a "path of least resistance" commit for MaPLe.  The
`collectStatics.Globals` and `collectStatics.RealConsts` passes are disabled,
but the `collectStatics.WordXVectorConsts` pass is enabled.  In the `backend`
pass (translation of RSSA to Machine), all RSSA statics are forced to the
`Dynamic` static heap (and assigned a corresponding global objptr slot);
similarly, any remaining `WordXVector` constants are forced to the `Dynamic`
static heap.  At program startup, the `Dynamic` static heap is copied into the
root hierarchical heap.  (This is slightly more complicated than the copy of the
`Dynamic` static heap into the initial heap in MLton, because in MaPLe the
`Dynamic` static heap may need to be split across multiple chunks.)

See MPLLang#127 for more discussion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Translate global refs to static mutable values
2 participants