-
Notifications
You must be signed in to change notification settings - Fork 129
Add static allocation support in Rssa through codegen #328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This will be inconsistent with objects since VectorInitElem has not been changed so header support is not yet present
The macro indirection is a bit out of place next to the other initializations via vectors
…ctorial over the index
No support yet in codegens. No support for accessing via a global except for heap objects (which are copied) because that requires extra support in c decls to assign the global Used M as the signifier because S, T, C, and X are taken; it could be taken to mean Memory or syMbol
Backend doesn't generate Binds for destinations that were turned into constants. As it happens, idempotency for existing constants turned any such binds to self-moves, which although were ill-typed (e.g. it could in theory include a move from the constant 3 into itself), were eliminated to Noops by the move function isLocation conflated two meanings: the ability to point to something, and the ability to be a destination for moves. Statics are constant pointers, and such are only in the latter class. The former is not needed, and can be recovered from the type
…zed words in Static.Data This will require codegen changes which are not present, as the non-pointer-sized words require a different structure layout (packed with several fields) ssa2-to-rssa does not take advantage of these changes in this patch
With the previous changes to statics, this isn't quite supported in any codegen yet as it requires the more complicated struct types.
New controls are: * `-static-alloc-wordvector-consts {true|false}` Controls whether or not `WordXVector` constants are converted to statics (with `ImmStatic` location) at `Ssa2ToRssa`. * `-static-init-arrays {true|false}` Controls whether or not `Array_alloc` primitives are converted to statics (with `MutStatic` or `Heap` location) at `Ssa2ToRssa`. * `-static-init-objects {none|staticAllocOnly|all}` Controls whether or not `Object` expressions are converted to statics at `Ssa2ToRssa`. If `staticAllocOnly`, then an object that would be converted to a static with `Heap` location is not converted to a static.
And eliminate `Static.Data.size: 'a Static.Data.t -> WordSize.t * int`.
….staticAllocWordVectorConsts`
New controls are: * `-static-alloc-arrays {true|false}` Controls whether or not `Array_alloc` primitives that can be statically initialized are forced to `Heap` location. * `-static-alloc-objects {true|false}` Controls whether or not `Object` expressions that can be statically initialized are forced to `Heap` location.
With b3873f1, a program can have an empty `objectInits[]` to be allocated in the initial dynamic heap. With an empty `objectInits[]`, `sizeofInitialBytesLive` would return 0. Attempting to create a heap of zero bytes would fail (the specified SUSv3 behavior of `mmap`), triggering a backoff computed as: highSize = newSize - s->sysvals.pageSize; newSize = align((factor-1) * (highSize / factor) + (lowSize / factor), s->sysvals.pageSize); However, `newSize - s->sysval.pageSize` (with `newSize` equals 0), wraps to nearly 2^64. Successive backoffs eventually bring the requested size down to one that can be satisfied, but the size is generally much larger than required. Moreover, the heap won't be resized until a subsequent GC, which may not occur during the run of the program, due to the heap being so large. Many regression tests run, but most that fork fail with `unhandled exception: SysErr: Cannot allocate memory [nomem]`, because both the parent has allocated a very large heap and the child is requesting to allocate a very large heap. Now, `sizeofInitialBytesLive` also includes the size of the initial thread/stack to be created, ensuring that it is always non-zero.
Objects without a representation (e.g., `unit`) may participate in the construction of other objects (e.g., `unit ref`). (* Globals: *) val x_0: unit = obj () ... val global_441: (unit mut) tuple = obj (x_0 (*obj ()*)) However, objects without a representation were not registered as being static, thus preventing subsequent objects constructed with them from being made static.
val location = getLocation (ty, !Control.staticAllocObjects, false) | ||
in | ||
if location <> Static.Location.Heap | ||
orelse staticInitHeapObjects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was struggling to find/write a source program that, with -static-alloc-internal-ptrs static -static-alloc-objects true -static-init-objects all
, would trigger this code path that leads to a statically initialized but dynamically allocated object. Eventually, I realized that there should be no such objects. With -static-alloc-internal-ptrs static -static-alloc-objects true
, the only objects that will be statics with a Heap
location are objects with a mutable Objptr
field. Moreover, only global objects are eligible to be statics. But, assuming a safe-for-space globalization pass, there should be no global objects with mutable Objptr
fields (because global objects are roots for the whole program execution). I seem to remember @jasoncarr0 making some comment that was supposed to suggest this (something along the lines of "by fiat, should never happen"), but I can't find it right now.
Using -globalize-small-types 9
, which allows the globalization of arbitrary array
and ref
objects (irrespective of the "size" of their contents), then we can trigger global objects with mutable Objptr
fields that become statically initialized, dynamically allocated.
With this observation about safe-for-space, I'm not understanding why there should ever be a non-empty globalObjptr[]
array. In a self-compile, there are 26.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can happen, if globalize-small-type 4
is on, then an example would be a mutable field of a tuple of total size more than 64 bytes. It is safe to globalize, but will turn into an Objptr in Rssa. With the current code, intInfs aren't included only because they're infrequent, so that would explain some globalObjptrs on a self-compile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I've made a mistake in understanding packed-representation here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, that a (int * int) ref
could be space-safely globalized and could result in an object with a mutable Objptr
field. However, it is also likely that such a tuple would be RefFlatten
ed; I suspect that is why we don't see many of them. As you say with -globalize-small-types 4
, then we might have a (int * int, int * int) either ref
be globalized.
Also, it turns out that we "grandfathered" IntInf.int
as a "small type"; so a IntInf.int ref
can be globalized, but is represented as an object with a mutable Objptr
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, we might have an IntInf.int ref ref
; the inner ref might be statically initialized (but dynamically allocated), so the outer ref could not be statically initialized.
The |
One contribution is that IntInf constants are not made into statics. |
Draft merge commit messageStatic allocation/initialization of objects in backend The main benefits are that code size and compile time are improved across the board, particularly for larger programs. Runtime sometimes improves, but should only affect programs which had hot code accessing globals, as it removes one level of indirection. Garbage collections might be marginally faster, as globals are now mostly skipped. Statically allocated and initialized objects are created in the main
Note that these are proper ML objects, with metadata and data. References to statically allocated objects are via pointers to the first data field (e.g., A special case of statically allocated objects are arrays, whose contents will be dynamically initialized by the mutator. These are also created in the main
Finally, dynamically allocated but statically initialized objects have their initialization data in the main
By default (with With With A number of controls have been added to control static allocation/initialization:
|
Previously, the initialization of an object was accomplished by a sequence of `Move` statements following the `Object` statement. This obscures the initialization and led to MLton#328 duplicating logic in `functor PackedRepresentation` and `functor Ssa2ToRssa`; see MLton#328 (comment).
When the mutator marks cards, `staticHeapR` cannot be used for objects with mutable objptr fields, because the address of an object in `staticHeapR` won't map to a valid card slot for the write barrier. Such global objects with mutable objptr fields must be placed in the dynamic heap and referenced indirectly via a `Global` operand. However, it is still possible to collect all such objects into a static heap, which is copied to the initial dynamic heap, rather than initializing them via the `initGlobals` function. See MLton#328 (comment).
Revised implementation of static allocation/initialization of globals Heavily inspired and based on a previous implementation by Jason Carr; see #328. #328 introduced static allocation/initialization of globals, but some complexities and issues with the implementation were noted during review: * #328 (comment) * #328 (comment) * #328 (comment) This revised implementation tries to simplify the complexities and address the issues: * The RSSA IR loses the `Operand.Static {static: Var.t Static.t, ty: Type.t}` variant and gains a `statics: {dst: Var.t * Type.t, obj: Object.t} vector` field in `Program.T`. The `PackedRepresentation` and `Ssa2ToRssa` passes are simplified, because the initial RSSA program is created with an empty `statics` field. The `rssaShrink1` pass takes care of constant-folding and copy-propagating of object initialization. New `collectStatics.{Globals,{WordXVector,Real}Consts}` passes introduce objects into the `statics` field. * The Machine IR gains a `staticsHeaps: StaticHeap.Kind -> StaticHeap.Object.t vector` field in `Program.T`. Each "kind" of static heap is emitted to the main `.c` file as a statically initialized data definition that "looks" like an ML heap. There are four kinds of heaps: * `Immutable`: for immutable objects; such objects need never be traversed by the GC. (Note that global `unit ref` objects can be placed in the `Immutable` static heap, since they will never actually be mutated.) * `Mutable`: for objects with mutable non-objptr fields; such objects may be mutated, but need never be traversed by the GC. (Note that global empty mutable sequences can be placed in the `Mutable` static heap, since, even if they have mutable objptr fields, since the elements will never actually be mutated.) * `Root`: when the mutator does not mark cards, for objects with mutable objptr fields; such objects may be mutated and need to be traversed by the GC (because they may be updated to point to objects in the runtime heap). However, if card marking is used by the mutator, then the `Root` static heap cannot be used, because the write barrier with a base object in the `Root` static heap will attempt to write to an invalid card slot index. It would be possible to make the write barrier more expensive, by dynamically checking if the base is in the `Root` static heap. * `Dynamic`: when the mutator marks cards, for objects with mutable objptr fields, such objects may be mutated and need to be traversed by the GC. The `Dynamic` static heap is copied to the initial runtime heap at runtime initialization. In `Backend`, each RSSA `static` is placed in an appropriate "kind" of static heaps. For objects placed in the `Dynamic` static heap, they are accessed by the rest of the program via `Global` operands (and incur a level of indirection). * The `Mutable` and `Root` heaps are properly saved and loaded by `MLton.World`. Other notable aspects of the PR: * The SSA2 IR gains an `Exp.Sequence of {args: Var.t vector vector}` variant to represent direct allocation of arrays and vectors, including initialization of elements. At `toSsa2`, the `Vector_vector` primitive is translated to `SsaTree2.Exp.Sequence`, rather than being translated to an `Array_alloc` `Array_update` `Array_toVector` sequence. At `Ssa2ToRssa`, a `SsaTree2.Exp.Sequence` is translated to an `Rssa.Object.Sequence` (via updates to `PackedRepresentation`). This allows global `Vector_vector` objects to be collected to statics. * A new `Array_array` primitive for literal arrays was introduced. The intention is that compilation might find opportunities to optimize explicit array allocation and initialization into the `Array_array` primitive. Currently, there is not support for "empty" static objects. In the previous static allocation/initialization implementation, a global `Array_alloc` (necessarily with a constant length operand) would be translated to a special kind of static that would be placed in the BSS segment of the executable and dynamically initialized. A future PR could restore this functionality as follows: * Introduce `MutableEmpty`, `RootEmpty`, and `DynamicEmpty` static heap kinds that simply specify a heap size, along with `mutableEmptyInit`, `rootEmptyInit`, and `dynamicEmptyInit` data to properly initialize the headers. * Don't lower `Array_alloc` prims in `Ssa2ToRssa`. After `rssaShrink1`, it will be possible to read off the `Array_alloc`'s with constant size. All such `Array_alloc`s in the `initGlobals` function can be lifted to RSSA `statics`. Meanwhile, such `Array_alloc`s in other functions can be more cheaply implemented via direct allocation by the mutator, rather than via the `GC_sequenceAllocate` runtime call (which induces a GC safe point). However, "empty" static objects are only created with the (non-default) `-globalize-arrays true`, and so weren't exercised by default in the previous implementation.
MLton/mlton#357 (revising MLton/mlton#328) introduced a number of "static heaps" into the compilation. Essentially, many "global" objects can be fully evaluated at compile time and represented in the compiled program as statics. The "static" objects look like regular ML objects (with proper headers, etc.), but exist outside the MLton heap. This is a "path of least resistance" commit for MaPLe. The `collectStatics.Globals` and `collectStatics.RealConsts` passes are disabled, but the `collectStatics.WordXVectorConsts` pass is enabled. In the `backend` pass (translation of RSSA to Machine), all RSSA statics are forced to the `Dynamic` static heap (and assigned a corresponding global objptr slot); similarly, any remaining `WordXVector` constants are forced to the `Dynamic` static heap. At program startup, the `Dynamic` static heap is copied into the root hierarchical heap. (This is slightly more complicated than the copy of the `Dynamic` static heap into the initial heap in MLton, because in MaPLe the `Dynamic` static heap may need to be split across multiple chunks.) See MPLLang#127 for more discussion.
MLton/mlton#357 (revising MLton/mlton#328) introduced a number of "static heaps" into the compilation. Essentially, many "global" objects can be fully evaluated at compile time and represented in the compiled program as statics. The "static" objects look like regular ML objects (with proper headers, etc.), but exist outside the MLton heap. This is a "path of least resistance" commit for MaPLe. The `collectStatics.Globals` and `collectStatics.RealConsts` passes are disabled, but the `collectStatics.WordXVectorConsts` pass is enabled. In the `backend` pass (translation of RSSA to Machine), all RSSA statics are forced to the `Dynamic` static heap (and assigned a corresponding global objptr slot); similarly, any remaining `WordXVector` constants are forced to the `Dynamic` static heap. At program startup, the `Dynamic` static heap is copied into the root hierarchical heap. (This is slightly more complicated than the copy of the `Dynamic` static heap into the initial heap in MLton, because in MaPLe the `Dynamic` static heap may need to be split across multiple chunks.) See MPLLang#127 for more discussion.
MLton/mlton#357 (revising MLton/mlton#328) introduced a number of "static heaps" into the compilation. Essentially, many "global" objects can be fully evaluated at compile time and represented in the compiled program as statics. The "static" objects look like regular ML objects (with proper headers, etc.), but exist outside the MLton heap. This is a "path of least resistance" commit for MaPLe. The `collectStatics.Globals` and `collectStatics.RealConsts` passes are disabled, but the `collectStatics.WordXVectorConsts` pass is enabled. In the `backend` pass (translation of RSSA to Machine), all RSSA statics are forced to the `Dynamic` static heap (and assigned a corresponding global objptr slot); similarly, any remaining `WordXVector` constants are forced to the `Dynamic` static heap. At program startup, the `Dynamic` static heap is copied into the root hierarchical heap. (This is slightly more complicated than the copy of the `Dynamic` static heap into the initial heap in MLton, because in MaPLe the `Dynamic` static heap may need to be split across multiple chunks.) See MPLLang#127 for more discussion.
The later IRs now support static data of various sorts. Some features are merged into statics:
isMutable
in Rssa has been weakened topinned
which is only needed to maintain the changes from BounceVars without excessive liveness analysis. The vectors field of Machine.Program.T has been changed to include general statics. In Rssa, a new expression defines a unique static for each usage in the program. In the backend, statics are moved into the program (no uniqueifying is performed, as it seems to be an artifact of propagation of constants), and references are done through an operand. The operand Static is a constant value which has type CPointer or ObjPtr, and its data can be accessed with Offset/Contents as normal. Thus, it has no address, and it cannot be written to (which indicated some conflations of meaning for isLocation).Under the default behavior, globals are not statically allocated if they have one or more mutable fields which can hold an object pointer (after packing) or if their data is not statically allocated. In practice, there are no such globals for just about all programs.
This version has support in all codegens. As expected, code size and compile time are improved across the board, particularly for larger programs. Runtime improves sometimes, but should only affect programs which had hot code accessing globals, as it removes one level of indirection. We should also expect marginally faster garbage collections, as globals are now mostly skipped.
A couple pieces are still outstanding for discussion and review:
This resolves #300
This affects the previous globalization changes in #288, so they should be re-examined as well. In particular effects due to extra indirections should disappear, so remaining effects will only be due to low-level perturbations and interactions with other passes (local-ref for instance).