Add static allocation support in Rssa through codegen #328

jasoncarr0 · 2019-08-12T22:42:54Z

The later IRs now support static data of various sorts. Some features are merged into statics: isMutable in Rssa has been weakened to pinned which is only needed to maintain the changes from BounceVars without excessive liveness analysis. The vectors field of Machine.Program.T has been changed to include general statics. In Rssa, a new expression defines a unique static for each usage in the program. In the backend, statics are moved into the program (no uniqueifying is performed, as it seems to be an artifact of propagation of constants), and references are done through an operand. The operand Static is a constant value which has type CPointer or ObjPtr, and its data can be accessed with Offset/Contents as normal. Thus, it has no address, and it cannot be written to (which indicated some conflations of meaning for isLocation).

Under the default behavior, globals are not statically allocated if they have one or more mutable fields which can hold an object pointer (after packing) or if their data is not statically allocated. In practice, there are no such globals for just about all programs.

This version has support in all codegens. As expected, code size and compile time are improved across the board, particularly for larger programs. Runtime improves sometimes, but should only affect programs which had hot code accessing globals, as it removes one level of indirection. We should also expect marginally faster garbage collections, as globals are now mostly skipped.

A couple pieces are still outstanding for discussion and review:

Command line options. There's a lot of dimensions to this, and in general they should all be enabled unless a bug or performance regression arises in user code, or a GC choice makes them impossible. Right now disabling it for particular sorts of data still leaves them statically initialized (and heap allocated), but ...
Objects which are not statically allocated are currently created in code. It seems not to be worth it to make a statically initialized heap allocated object, as the code to initialize them is quite small. The statically allocated objects should have better compile time though.
There's still a good bit of complexity that I'd like to reduce as much as possible of. Some options are partially coupled: generally the heap allocated statics are precisely the statics with destination globals. As an awkwardness with this, which de-facto cannot occur (due to propagation), is a non-heap static with global, which would be copied to the heap under the current code (but doing something else would be an additional edge case). Empty (array) data has to have different initialization rules from others to avoid executable bloat (so it goes in bss, instead of text). I have at least managed to center all decision-making to ssa2-to-rssa and all initialization to c codegen's outputDecls.
Rssa still has WordXVector support for constants. Only one particular pass in Rssa created them, so it may be easiest to just disable it, which would simplify the backend and reduce redundancy. The complexity here is that disabling this requires duplicating or factoring the logic for deciding where to place statics (WordXVectors are only placed on the heap at the moment).
Statics are somewhat inconsistent with globals: they are stored with the program, and information about them is local, whereas globals are created and stored globally, but I couldn't convince myself to create a module just for a single integer counter.

This resolves #300

This affects the previous globalization changes in #288, so they should be re-examined as well. In particular effects due to extra indirections should disappear, so remaining effects will only be due to low-level perturbations and interactions with other passes (local-ref for instance).

…option])

This will be inconsistent with objects since VectorInitElem has not been changed so header support is not yet present

The macro indirection is a bit out of place next to the other initializations via vectors

…ctorial over the index

No support yet in codegens. No support for accessing via a global except for heap objects (which are copied) because that requires extra support in c decls to assign the global Used M as the signifier because S, T, C, and X are taken; it could be taken to mean Memory or syMbol

Backend doesn't generate Binds for destinations that were turned into constants. As it happens, idempotency for existing constants turned any such binds to self-moves, which although were ill-typed (e.g. it could in theory include a move from the constant 3 into itself), were eliminated to Noops by the move function isLocation conflated two meanings: the ability to point to something, and the ability to be a destination for moves. Statics are constant pointers, and such are only in the latter class. The former is not needed, and can be recovered from the type

…zed words in Static.Data This will require codegen changes which are not present, as the non-pointer-sized words require a different structure layout (packed with several fields) ssa2-to-rssa does not take advantage of these changes in this patch

With the previous changes to statics, this isn't quite supported in any codegen yet as it requires the more complicated struct types.

…degen

New controls are: * `-static-alloc-wordvector-consts {true|false}` Controls whether or not `WordXVector` constants are converted to statics (with `ImmStatic` location) at `Ssa2ToRssa`. * `-static-init-arrays {true|false}` Controls whether or not `Array_alloc` primitives are converted to statics (with `MutStatic` or `Heap` location) at `Ssa2ToRssa`. * `-static-init-objects {none|staticAllocOnly|all}` Controls whether or not `Object` expressions are converted to statics at `Ssa2ToRssa`. If `staticAllocOnly`, then an object that would be converted to a static with `Heap` location is not converted to a static.

And eliminate `Static.Data.size: 'a Static.Data.t -> WordSize.t * int`.

….staticAllocWordVectorConsts`

New controls are: * `-static-alloc-arrays {true|false}` Controls whether or not `Array_alloc` primitives that can be statically initialized are forced to `Heap` location. * `-static-alloc-objects {true|false}` Controls whether or not `Object` expressions that can be statically initialized are forced to `Heap` location.

With b3873f1, a program can have an empty `objectInits[]` to be allocated in the initial dynamic heap. With an empty `objectInits[]`, `sizeofInitialBytesLive` would return 0. Attempting to create a heap of zero bytes would fail (the specified SUSv3 behavior of `mmap`), triggering a backoff computed as: highSize = newSize - s->sysvals.pageSize; newSize = align((factor-1) * (highSize / factor) + (lowSize / factor), s->sysvals.pageSize); However, `newSize - s->sysval.pageSize` (with `newSize` equals 0), wraps to nearly 2^64. Successive backoffs eventually bring the requested size down to one that can be satisfied, but the size is generally much larger than required. Moreover, the heap won't be resized until a subsequent GC, which may not occur during the run of the program, due to the heap being so large. Many regression tests run, but most that fork fail with `unhandled exception: SysErr: Cannot allocate memory [nomem]`, because both the parent has allocated a very large heap and the child is requesting to allocate a very large heap. Now, `sizeofInitialBytesLive` also includes the size of the initial thread/stack to be created, ensuring that it is always non-zero.

Objects without a representation (e.g., `unit`) may participate in the construction of other objects (e.g., `unit ref`). (* Globals: *) val x_0: unit = obj () ... val global_441: (unit mut) tuple = obj (x_0 (*obj ()*)) However, objects without a representation were not registered as being static, thus preventing subsequent objects constructed with them from being made static.

MatthewFluet · 2019-09-19T12:16:39Z

mlton/backend/ssa2-to-rssa.fun

+                                             val location = getLocation (ty, !Control.staticAllocObjects, false)
+                                          in
+                                             if location <> Static.Location.Heap
+                                                orelse staticInitHeapObjects


I was struggling to find/write a source program that, with -static-alloc-internal-ptrs static -static-alloc-objects true -static-init-objects all, would trigger this code path that leads to a statically initialized but dynamically allocated object. Eventually, I realized that there should be no such objects. With -static-alloc-internal-ptrs static -static-alloc-objects true, the only objects that will be statics with a Heap location are objects with a mutable Objptr field. Moreover, only global objects are eligible to be statics. But, assuming a safe-for-space globalization pass, there should be no global objects with mutable Objptr fields (because global objects are roots for the whole program execution). I seem to remember @jasoncarr0 making some comment that was supposed to suggest this (something along the lines of "by fiat, should never happen"), but I can't find it right now.

Using -globalize-small-types 9, which allows the globalization of arbitrary array and ref objects (irrespective of the "size" of their contents), then we can trigger global objects with mutable Objptr fields that become statically initialized, dynamically allocated.

With this observation about safe-for-space, I'm not understanding why there should ever be a non-empty globalObjptr[] array. In a self-compile, there are 26.

It can happen, if globalize-small-type 4 is on, then an example would be a mutable field of a tuple of total size more than 64 bytes. It is safe to globalize, but will turn into an Objptr in Rssa. With the current code, intInfs aren't included only because they're infrequent, so that would explain some globalObjptrs on a self-compile.

Unless I've made a mistake in understanding packed-representation here

You are right, that a (int * int) ref could be space-safely globalized and could result in an object with a mutable Objptr field. However, it is also likely that such a tuple would be RefFlattened; I suspect that is why we don't see many of them. As you say with -globalize-small-types 4, then we might have a (int * int, int * int) either ref be globalized.

Also, it turns out that we "grandfathered" IntInf.int as a "small type"; so a IntInf.int ref can be globalized, but is represented as an object with a mutable Objptr field.

Finally, we might have an IntInf.int ref ref; the inner ref might be statically initialized (but dynamically allocated), so the outer ref could not be statically initialized.

MatthewFluet · 2019-09-19T12:19:02Z

The LocalRef optimization pass has a pre-transformation that moves any global 'a ref objects that are used in exactly one function, into the using function, so that it has an opportunity to be localized. However, if the ref is not turned into a local, then it is not moved back to a global.

MatthewFluet · 2019-09-19T12:55:34Z

With this observation about safe-for-space, I'm not understanding why there should ever be a non-empty globalObjptr[] array. In a self-compile, there are 26.

One contribution is that IntInf constants are not made into statics.

MatthewFluet · 2019-09-20T01:06:26Z

Draft merge commit message

Static allocation/initialization of objects in backend

The main benefits are that code size and compile time are improved across the board, particularly for larger programs. Runtime sometimes improves, but should only affect programs which had hot code accessing globals, as it removes one level of indirection. Garbage collections might be marginally faster, as globals are now mostly skipped.

Statically allocated and initialized objects are created in the main .c file, where they will be placed in the data segment of the executable:

const struct {Word64 meta_0; Word64 meta_1; Word64 meta_2; Word8 data[9];}
static_20 = {(Word64)(0x0ull), (Word64)(0x9ull), (Word64)(0x7ull), "addrinuse"};
const struct {Word64 meta_0; Word32 data_0; Word32 data_1; Pointer data_2; }
static_21 = {(Word64)(0x29ull), (Word32)(0x62ull), (Word32)(0x0ull), ((Pointer)(&static_20) + 24)};

Note that these are proper ML objects, with metadata and data. References to statically allocated objects are via pointers to the first data field (e.g., &static20 + 24). Note also that WordXVector (e.g., strings) are a special case of statically allocated and initialized objects. Statically allocated and initialized objects can be both immutable and mutable, although the latter should be restricted to objects with non-Objptr mutable fields.

A special case of statically allocated objects are arrays, whose contents will be dynamically initialized by the mutator. These are also created in the main .c file, but are placed in the bss segment of the executable (decreasing the size of the executable) and proper metadata is written by initialization code:

struct {Word64 meta_0; Word64 meta_1; Word64 meta_2; Word8 data[800000];}
static_26;
struct {Word64 meta_0; Word64 meta_1; Word64 meta_2; Word8 data[0];}
static_31;

static void static_Init() {
    memcpy (&static_26, &((struct {Word64 meta_0; Word64 meta_1; Word64 meta_2}){(Word64)(0x0ull), (Word64)(0x186A0ull), (Word64)(0x11ull)}), 24);
    memcpy (&static_31, &((struct {Word64 meta_0; Word64 meta_1; Word64 meta_2}){(Word64)(0x0ull), (Word64)(0x0ull), (Word64)(0x13ull)}), 24);
};

Finally, dynamically allocated but statically initialized objects have their initialization data in the main .c file along with information to copy that data to the initial dynamic heap during initWorld:

const static struct {Word64 meta_0; Word64 data_0; }
static_9819 = {(Word64)(0x79Dull), (Word64)(0x1ull)};
const static struct {Word64 meta_0; Word64 data_0; }
static_9820 = {(Word64)(0x79Dull), (Word64)(0x1ull)};

static struct GC_objectInit objectInits[] = {
    { 11, 8, 16, ((Pointer) &static_9819) },
    { 12, 8, 16, ((Pointer) &static_9820) },
    ...
}

By default (with -static-init-objects staticAllocOnly), no such objects are created.

With -static-init-objects all, global objects with Objptr mutable fields would be dynamically allocated but statically initialized. But, such global objects are rare. For example, a (int * int) ref could be space-safely globalized and would be an object with a mutable Objptr field. However, it is also likely that such a tuple would be RefFlattened. With -globalize-small-type 4 (see #288 and 752467c), an (int * int, int * int) either ref could be globalized and represented as an object with a mutable Objptr field. Similarly, an IntInf.int ref can also be globalized and would be represented as an object with a mutable Objptr field.

With -static-alloc-objects false -static-init-objects all, all global objects will be dynamically allocated but statically initialized (and no global objects will be statically allocated). Similarly, with -static-alloc-wordvector-consts false, string constants will be dynamically allocated but statically initialized; this corresponds to the previous MLton behavior with respect to string constants.

A number of controls have been added to control static allocation/initialization:

-static-alloc-internal-ptrs {static|all|none}

Controls which kinds of objects can be statically allocated:
- static: only objects with all fields either immutable or non-Objptr
- none: only objects with no fields
- all: all objects
The all setting is incompatible with the current GC for two reasons. First, statically allocated objects are not traced by the GC; a statically-allocated object that is updated with an Objptr to an object in the heap should be considered a root. Second, a statically-allocated object that is updated with an Objptr would trigger a card marking, but the address of a statically-allocated object would not map to a valid card slot.
-static-alloc-wordvector-consts {true|false}

Controls whether or not WordXVector constants are converted to statics (with ImmStatic location) at Ssa2ToRssa.
-static-init-arrays {true|false}

Controls whether or not Array_alloc primitives are converted to statics (with MutStatic or Heap location) at Ssa2ToRssa.
-static-alloc-arrays {true|false}

Controls whether or not Array_alloc primitives that can be statically initialized are forced to Heap location.
-static-init-objects {none|staticAllocOnly|all}

Controls whether or not Object expressions are converted to statics at Ssa2ToRssa. If staticAllocOnly, then an object that would be converted to a static with Heap location is not converted to a static.
-static-alloc-objects {true|false}

Controls whether or not Object expressions that can be statically initialized are forced to Heap location.

Previously, the initialization of an object was accomplished by a sequence of `Move` statements following the `Object` statement. This obscures the initialization and led to MLton#328 duplicating logic in `functor PackedRepresentation` and `functor Ssa2ToRssa`; see MLton#328 (comment).

See MLton#328 (comment).

When the mutator marks cards, `staticHeapR` cannot be used for objects with mutable objptr fields, because the address of an object in `staticHeapR` won't map to a valid card slot for the write barrier. Such global objects with mutable objptr fields must be placed in the dynamic heap and referenced indirectly via a `Global` operand. However, it is still possible to collect all such objects into a static heap, which is copied to the initial dynamic heap, rather than initializing them via the `initGlobals` function. See MLton#328 (comment).

Revised implementation of static allocation/initialization of globals Heavily inspired and based on a previous implementation by Jason Carr; see #328. #328 introduced static allocation/initialization of globals, but some complexities and issues with the implementation were noted during review: * #328 (comment) * #328 (comment) * #328 (comment) This revised implementation tries to simplify the complexities and address the issues: * The RSSA IR loses the `Operand.Static {static: Var.t Static.t, ty: Type.t}` variant and gains a `statics: {dst: Var.t * Type.t, obj: Object.t} vector` field in `Program.T`. The `PackedRepresentation` and `Ssa2ToRssa` passes are simplified, because the initial RSSA program is created with an empty `statics` field. The `rssaShrink1` pass takes care of constant-folding and copy-propagating of object initialization. New `collectStatics.{Globals,{WordXVector,Real}Consts}` passes introduce objects into the `statics` field. * The Machine IR gains a `staticsHeaps: StaticHeap.Kind -> StaticHeap.Object.t vector` field in `Program.T`. Each "kind" of static heap is emitted to the main `.c` file as a statically initialized data definition that "looks" like an ML heap. There are four kinds of heaps: * `Immutable`: for immutable objects; such objects need never be traversed by the GC. (Note that global `unit ref` objects can be placed in the `Immutable` static heap, since they will never actually be mutated.) * `Mutable`: for objects with mutable non-objptr fields; such objects may be mutated, but need never be traversed by the GC. (Note that global empty mutable sequences can be placed in the `Mutable` static heap, since, even if they have mutable objptr fields, since the elements will never actually be mutated.) * `Root`: when the mutator does not mark cards, for objects with mutable objptr fields; such objects may be mutated and need to be traversed by the GC (because they may be updated to point to objects in the runtime heap). However, if card marking is used by the mutator, then the `Root` static heap cannot be used, because the write barrier with a base object in the `Root` static heap will attempt to write to an invalid card slot index. It would be possible to make the write barrier more expensive, by dynamically checking if the base is in the `Root` static heap. * `Dynamic`: when the mutator marks cards, for objects with mutable objptr fields, such objects may be mutated and need to be traversed by the GC. The `Dynamic` static heap is copied to the initial runtime heap at runtime initialization. In `Backend`, each RSSA `static` is placed in an appropriate "kind" of static heaps. For objects placed in the `Dynamic` static heap, they are accessed by the rest of the program via `Global` operands (and incur a level of indirection). * The `Mutable` and `Root` heaps are properly saved and loaded by `MLton.World`. Other notable aspects of the PR: * The SSA2 IR gains an `Exp.Sequence of {args: Var.t vector vector}` variant to represent direct allocation of arrays and vectors, including initialization of elements. At `toSsa2`, the `Vector_vector` primitive is translated to `SsaTree2.Exp.Sequence`, rather than being translated to an `Array_alloc` `Array_update` `Array_toVector` sequence. At `Ssa2ToRssa`, a `SsaTree2.Exp.Sequence` is translated to an `Rssa.Object.Sequence` (via updates to `PackedRepresentation`). This allows global `Vector_vector` objects to be collected to statics. * A new `Array_array` primitive for literal arrays was introduced. The intention is that compilation might find opportunities to optimize explicit array allocation and initialization into the `Array_array` primitive. Currently, there is not support for "empty" static objects. In the previous static allocation/initialization implementation, a global `Array_alloc` (necessarily with a constant length operand) would be translated to a special kind of static that would be placed in the BSS segment of the executable and dynamically initialized. A future PR could restore this functionality as follows: * Introduce `MutableEmpty`, `RootEmpty`, and `DynamicEmpty` static heap kinds that simply specify a heap size, along with `mutableEmptyInit`, `rootEmptyInit`, and `dynamicEmptyInit` data to properly initialize the headers. * Don't lower `Array_alloc` prims in `Ssa2ToRssa`. After `rssaShrink1`, it will be possible to read off the `Array_alloc`'s with constant size. All such `Array_alloc`s in the `initGlobals` function can be lifted to RSSA `statics`. Meanwhile, such `Array_alloc`s in other functions can be more cheaply implemented via direct allocation by the mutator, rather than via the `GC_sequenceAllocate` runtime call (which induces a GC safe point). However, "empty" static objects are only created with the (non-default) `-globalize-arrays true`, and so weren't exercised by default in the previous implementation.

MLton/mlton#357 (revising MLton/mlton#328) introduced a number of "static heaps" into the compilation. Essentially, many "global" objects can be fully evaluated at compile time and represented in the compiled program as statics. The "static" objects look like regular ML objects (with proper headers, etc.), but exist outside the MLton heap. This is a "path of least resistance" commit for MaPLe. The `collectStatics.Globals` and `collectStatics.RealConsts` passes are disabled, but the `collectStatics.WordXVectorConsts` pass is enabled. In the `backend` pass (translation of RSSA to Machine), all RSSA statics are forced to the `Dynamic` static heap (and assigned a corresponding global objptr slot); similarly, any remaining `WordXVector` constants are forced to the `Dynamic` static heap. At program startup, the `Dynamic` static heap is copied into the root hierarchical heap. (This is slightly more complicated than the copy of the `Dynamic` static heap into the initial heap in MLton, because in MaPLe the `Dynamic` static heap may need to be split across multiple chunks.) See MPLLang#127 for more discussion.

jasoncarr0 added 30 commits July 3, 2019 13:04

Add Static structure

61de3a4

Use statics in Machine

e4017ec

Optional global for statics, reorder reals/statics to (data, global […

318d16b

…option])

Add statics to rssa

063fe52

Add enough static support to recreate vector initialization

04f5a19

This will be inconsistent with objects since VectorInitElem has not been changed so header support is not yet present

Small cleanups on c-codegen for static changes

76c35a4

Remove macro-indirection on vector inits and rename

27734be

The macro indirection is a bit out of place next to the other initializations via vectors

Move Static to BackendAtoms, make Static.t polymorphic instead of fun…

5e1837e

…ctorial over the index

Add static support in the C codegen

2e24412

Correctly offset static accesses in C codegen by header size

035d556

Add preliminary Static creation to ssa2-to-rssa

b98ebc4

With the previous changes to statics, this isn't quite supported in any codegen yet as it requires the more complicated struct types.

Improve layout of statics in rssa/machine

ff52fae

Pad Static.Object components from PackedRepresentation to prim

8be3094

Return any elem from PackedRepresentation, not just word

f4a3b3d

Fix order of statics in backend

a103b66

Fix offsets for statics in c codegen

46cfaa8

Fix static struct organization in c-codegen

189c902

Fix static macro in c-chunk.h

9f26780

Add packed attribute to static structures

463ecb4

Ensure that static offsets are taken after casting to pointer in c co…

9aebf23

…degen

Remove GC invariant that all object pointers point into heap

945d416

Only forward object pointers in heap on major cheney copy

46bf90a

Add object/vector creation helpers in Static

82f841d

Only translate addresses in the from space for gc translate

2be850a

Only follow pointers into heap in dfs-mark

ec2a3af

dfs mark returns immediately if root is not in heap

d43f54d

Fix typo not allocating static arrays correctly

466e98e

MatthewFluet added 12 commits September 17, 2019 13:58

Add Static.dataSize: 'a t -> Bytes.t

b13f7a8

And eliminate `Static.Data.size: 'a Static.Data.t -> WordSize.t * int`.

Allocate WordXVector constants created by RSSA according to `!Control…

b3873f1

….staticAllocWordVectorConsts`

Fix bug introduced by 4a49d4d

80ffc2c

Fixup whitespace

8ced492

Tweak size.sml regression to not use a static string

65c455f

Tweak mlton.share.sml regression to not use static strings/objects

1e91cd9

Treat MLton_bogus as a const for static alloc/init

a9d897b

Refactor functions in translateGlobalStatics

30252b5

MatthewFluet reviewed Sep 19, 2019

View reviewed changes

MatthewFluet added 3 commits September 19, 2019 09:03

Recognize Const.IntInf and Const.Null as statics

e47f7e0

Recognize that Array_toVector preserves staticness

f39f811

Avoid constant propagating Heap statics in backend

bc38903

MatthewFluet merged commit 57c9c94 into MLton:master Sep 20, 2019

MatthewFluet added a commit to MatthewFluet/mlton that referenced this pull request Jan 10, 2020

Default to using static heaps (rather than static objects)

6eff56b

See MLton#328 (comment).

MatthewFluet added a commit to MatthewFluet/mlton that referenced this pull request Jan 10, 2020

Remove "static objects" implementation in favor of "static heaps"

08cc5bc

See MLton#328 (comment).

MatthewFluet mentioned this pull request Jan 10, 2020

Revised implementation of static allocation/initialization of globals #357

Merged

MatthewFluet mentioned this pull request Sep 21, 2020

Upstream merge MPLLang/mpl#127

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add static allocation support in Rssa through codegen #328

Add static allocation support in Rssa through codegen #328

jasoncarr0 commented Aug 12, 2019 •

edited

Loading

MatthewFluet Sep 19, 2019

jasoncarr0 Sep 19, 2019

jasoncarr0 Sep 19, 2019

MatthewFluet Sep 19, 2019

MatthewFluet Sep 19, 2019

MatthewFluet commented Sep 19, 2019

MatthewFluet commented Sep 19, 2019

MatthewFluet commented Sep 20, 2019 •

edited

Loading

Add static allocation support in Rssa through codegen #328

Add static allocation support in Rssa through codegen #328

Conversation

jasoncarr0 commented Aug 12, 2019 • edited Loading

MatthewFluet Sep 19, 2019

Choose a reason for hiding this comment

jasoncarr0 Sep 19, 2019

Choose a reason for hiding this comment

jasoncarr0 Sep 19, 2019

Choose a reason for hiding this comment

MatthewFluet Sep 19, 2019

Choose a reason for hiding this comment

MatthewFluet Sep 19, 2019

Choose a reason for hiding this comment

MatthewFluet commented Sep 19, 2019

MatthewFluet commented Sep 19, 2019

MatthewFluet commented Sep 20, 2019 • edited Loading

Draft merge commit message

jasoncarr0 commented Aug 12, 2019 •

edited

Loading

MatthewFluet commented Sep 20, 2019 •

edited

Loading