Skip to content
This repository was archived by the owner on Apr 25, 2025. It is now read-only.
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

Hierarchies #245

Closed
Closed
@RossTate

Description

@RossTate

The following motivates and describes a change to the type section. (Instructions are unchanged.)

Consolidating Too Much Information

Many (most?) well-known type systems run into problems when they have too much information. To illustrate, I'll pick on Kotlin because in Kotlin I can most directly recreate the examples I've discussed on this elsewhere, but be aware that the overarching issue can be demonstrated in most major languages.

Here's a Kotlin program that type-checks. Notice the functions asRed, asGreen, and asBlue. These are simply the identify functions, so they seem like they should be no-ops. You should be able to just inline them, right? Well, unfortunately inlining is not type-preserving in Kotlin. These functions are serving an important purpose: they're telling the type-checker how to type-check the code by downgrading the information about the variable c's type. By inlining them, you've removed those downgrades, and if you though inlining should be type-preserving then you've made type-checking NP-Hard per this presentation.

The issue is that, in order to support common patterns, Kotlin deduces information about a variable's type within contexts where that immutable variable has been cast, and here that means that c is simultaneously a Red, a Green, and a Blue. The grammar for field accesses in Kotlin only specifies the field name (analogous to field offset in the MVP), leaving the type of the field to be synthesized from the type of the receiver. But here the receiver has an intersection type with each component specifying different types for the field, so the type-checker has too much information.

To address this and similar cases, Kotlin throws out subsumptive subtyping (i.e. this core wasm rule) and raises an ambiguity type-checking error asking the programmer to resolve the error by downgrading types. Many languages take this approach, and if you've ever encountered an ambiguity error than you're using a language without subsumptive subtyping. This is reasonable in surface languages where programmers are interactively involved and can patch the code accordingly. But it doesn't work so well for low-level languages where programmers cannot so easily patch compiled code, and where program transformations/optimizations (such as inlining) are readily employed but are not type-preserving without subsumption.

Now, if you look at the Kotlin example, you'll notice that Red, Green, and Blue are all classes, and none of these inherit the others. Kotlin permits only single-class inheritance, which means unrelated classes can have no common instances. Because the type-checker has complete knowledge of the class-inheritance hierarchy, it could realize that c cannot exist and can be given the "bottom" type Nothing. This amounts to consolidating information into something that is unambiguous for the type-checker. The "hierarchies" I describe here provide the means to always consolidate information and thereby avoid both ambiguity and exponential type-checking. (This doesn't work for Kotlin interfaces, but thankfully interfaces are a surface-level construct that gets lowered to distinct low-level constructs, and so the lowering process itself must resolve the ambiguity above the WebAssembly level.)

Design

A hierarchy is defined as follows in the types section:

deftype ::= ... | <hierarchy>
hierarchy ::= hierarchy 1 <dependent> ; (more dependents in future)
dependent ::= fields <fieldtype>* | ... (more cases in future)

Notice that this is essentially a struct definition, but with room for more extensions, and with different import/export semantics (discussed shortly).

Besides adding this, the only other change to the (nominal) MVP would be to require struct_subtype and array_subtype to subtype only hierarchy/struct_subtype/array_subtype types (and extensible imported types, per below). This ensures all nominal types belong to some hierarchy. (Typically there will be only one hierarchy per module.)

Imports and Exports

For simplicity, let's suppose that no field hiding is done by the exporting module. In that case, one exports hierarchies as follows:

exportdesc ::= ... | hierarchy <typeidx> | type <exportidx> <typeidx>

For hierarchy <typeidex>, the index must refer to a hierarchy definition.
For type <exportidx>, the index must refer to either an earlier hierarchy export or an earlier type export, in which case the <typeidx> must be a subtype of the type specified by that export.
The use of export indices specifies a subtyping relation between exported types.

One imports hierarchies and types as follows:

importdesc ::= ... | hierarchy extensible? 1 <dependent> | struct_subtype <importidx> extensible? <fieldtype>* | array_subtype <importidx> <fieldtype>

The field types in the _subtype imports are required to be subtypes of their counterparts in the <importidx>.
The use of import indices specifies a subtyping relation between imported types.

When matching individual exports with individual imports, one checks the following properties:

  • Hierarchies must match with hierarchies, and the <dependent> of the export must be a subtype of (or, if the import is extensible, equivalent to) the import's <dependent>
  • A struct_subtype import must match with a type export referencing a struct_subtype type, and the <fieldtype>* of the export must be a subtype of (or, if the import is extensible, equivalent to) the import's <fieldtype>*
  • An array_subtype import must match with a type export referencing an array_subtype type, and the <fieldtype> of the export must be equivalent to the import's <fieldtype>
  • For _subtype imports, the imported type referenced <importidx> must be a subtype in the exported hierarchy

Furthermore, one checks the following properties across imports and exports:

  • For any two imports of types in the same import hierarchy, their corresponding exported types must be subtypes within the export hierarchy if and only if the imports are subtypes within the import hierarchy.

The "and only if" in this last step is critical as it means the importing module has complete knowledge of the subtyping relationships between the imported nominal types within the same hierarchy. This means we can employ the consolidation technique mentioned above: if ever an instance belongs to multiple unrelated nominal types, we can resolve the ambiguity by simply giving it the bottom type. And for module composition, it is important that types be grouped into hierarchies.

Multiple Dependents

The design above leaves room for defining hierarchies with multiple dependents.
In the (current) nominal MVP, there's a common pattern where one generates two nominal hierarchies: one for instances and one for v-tables.
Furthermore, the hierarchy for instances constantly redeclares all the field types for the primary purpose of refining the v-table field to use its counterpart's nominal type in the hierarchy for v-tables.

What's really happening here is that there's one nominal hierarchy with two pieces of information that depend on it: the field types of instances and the field types of v-tables.
With hierarchies, we can encode this pattern more directly.
At the same time, we can use this to add support for a common feature: the v-table's contents are usually placed directly in the object descriptor rather than as a separate field.

First, let's generalize hierarchies a bit:

hierarchy ::= hierarchy <dependent>*

Then let's require that one defines a new nominal type, the definition provides a refinement/extension of each of the dependents on the hierarchy, e.g. specifying new instance fields and new v-table fields.

Second, we could add a new instruction desc.get n : [(ref $t)] -> [tn]. This loads the n'th field from the object descriptor (such as the n'th method in the v-table). The type of this field is given by the second dependent specified by the type $t (whereas instance fields are given by the first dependent). Or we could add a new heap type desc <typeidx> for "object descriptors", add an instruction ref.get_descriptor : [(ref $t)] -> [(ref (desc $t))] for getting an object's descriptor (i.e. the thing that the head of the object points to), and extend get instructions for accessing the fields of these descriptors.

Anyways, there's a lot of flexibility here, and we don't have to do any of this for the MVP if we don't want, so I won't go into more detail unless requested. The only thing that's pressing for the MVP is adding some sort of "hierarchy" structure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Post-MVPIdeas for Post-MVP extensions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions