Skip to content
This repository was archived by the owner on Apr 25, 2025. It is now read-only.
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

Nominal types vs. inter-module interaction #148

Closed
@jakobkummerow

Description

@jakobkummerow

If I understand correctly, one of the biggest arguments for the current MVP's structural static type system is that it makes sharing types across module boundaries quite straightforward:

;; Module A
(type $string (array i8))
(function $hash (export "hash") (param (ref $string)) (result (ref $string))
 body...)
 
;; Module B
(type $my_string (array i8))
(function $hash (import "module" "hash") (param (ref $my_string)) (result (ref $my_string)))

...and things just work, because $string and $my_string are automatically recognized as being the same type.

Notable observations here are:

  • "things just work" might be a bit of an exaggeration: the two modules must still intentionally choose to use compatible/identical type definitions. Maybe one of them publishes an API spec that the other adopts, or maybe both authors communicate out-of-band and discuss what types to use, etc.
  • the names don't need to match, as the coordinating host (in case of browsers: JavaScript) can take e.g. one module's "md5hash" export and feed it to the other module's "stringhash" import.
  • considering that Wasm modules are typically not handwritten but rather generated by toolchains, it is not entirely clear how developers writing "surface language" code (Java etc.) can even control/influence the Wasm types that their toolchain will generate for them. Maybe this whole multi-module scenario is only realistic for sets of modules produced in one go by the same version of the same toolchain having a whole-program view of the surface language source? If so, that would likely change our collective perspective on a bunch of the concerns discussed below.

Now, if we switched the static type system to nominal semantics, that would in particular mean that two type definitions give us two distinct types, so the example above would no longer work "just like that". To still support cross-module object sharing, e.g. for function calls, we have to find an alternative solution.

One idea is to not change what happens at the module boundary, i.e. keep structural typing there. In a case like the above, where a module attempts to satisfy a function import with another module's function export and these functions use certain types, the engine would automatically try to match the types. I believe that this would work for the module interaction case, however it creates another problem: merging modules would potentially change semantics, and that is unacceptable. (Merging two modules is straightforward and should work fine, but consider a three-module case: module M imports functions from N and O, the modules define types $m, $n, $o that are structurally identical and used in these functions' signatures. If N and O are merged, the merging tool cannot know that $n and $o are supposed to be the same nominal type (because there are no function calls between N and O that would implicitly provide this information); when linking M with the merged NO, it becomes apparent that two nominally distinct types $n and $o in NO are supposed to match the same type $m in M. This would re-introduce structural type equivalence into NO "through the back door".) So I think this idea does not hold up to critical inspection.

Another idea that has been mentioned before is to import and export types just like functions:

;; Module A
(type $string (export "string") (array i8))

;; Module B
(type $my_string (import "module" "string") (array i8))

(Note that module B repeats the (array i8) type definition in order to allow separate compilation, i.e. we want to enable engines to compile code for B before having seen A or knowing that B will import things from A.)
That solves the problem of needing a single type, but creates a (potential?) new problem: it requires the modules to agree on the export/import role assignment. For functions, this is natural, because the whole point is that the function definition is large and complex and one module's responsibility. For types, this is less natural, especially if we imagine an NPM-style decentralized ecosystem of libraries: what if I want to import one module's string hashing function and another module's string compression function, who gets to export the string type used for these interactions?

Per the observations above, there is some doubt whether this scenario is realistic (and hence whether the concern is warranted), but we may want to avoid the issue anyway, so as not to paint ourselves into a corner. We could work around this difficulty by searching for more symmetric/flexible approaches to sharing types across different modules.

One idea is to define a symmetric version of the directed export→import concept. Maybe:

;; Module A
(type $string (importexport "string") (array i8))

;; Module B
(type $my_string (importexport "string") (array i8))

(Side note: maybe we should pick a different strawperson keyword than importexport, such as publicname, just to make (spoken) conversations less ambiguous: "importexport" sounds like "import/export", but describes a significantly different notion.)

Problems solved with this approach:

  • we now have an explicit way to achieve the identification of two type definitions with one another that structural static types gave us implicitly. Notably, when two modules annotate two types this way as supposedly identical, the engine has to do the same structural type check as it does with the current MVP to verify this annotation's validity. (Or at least a very similar check; nominal types make recursive steps return more quickly but otherwise the algorithm is about the same.)
  • there's no longer a need to select an exporting module, all modules have equal "rank" wrt. type definitions
  • module merging is easy, as a merging tool knows exactly which types should be identified. (Such a tool may or may not perform its own checks to ensure that this identification of types is valid; if it is invalid, then the resulting merged module will be invalid and will fail validation later on.)

Problems created:

  • the modules must agree on the name "string". I would argue that this is not a problem in practice: since even in the structural world, module authors have to agree on matching type definitions, they now simply have to agree on one more detail, which is the type's importexport name. That's not a significant increase in coordination burden.
  • by having an implicit global shared namespace for importexport type names, there could be unintentional collisions. That is a serious concern that needs a solution.

Solution A: treat identical importexport names as requests that may well fail: a matching name causes the engine to perform a structural type check; if that check fails, then that's not a validation error, the types simply won't be identified with each other. If the two modules don't try to use the types in question for any interactions, then that's just fine. One way to look at this would be to say "we still have structural typing across module boundaries, but with the twist that the importexport name is part of the structural definition".

Solution B: analogous to how the coordinating host can map function exports to imports with different names, we could also empower (or burden) this coordinator with matching up type names. This would, effectively, give each module its own namespace for public type names. (This might be desirable in addition to solution A in order to solve the opposite problem: unintentional mismatches of type names.)

Solution C: please contribute other ideas by commenting :-)

Speaking of comments, I'm mostly trying to get a conversation / thought process started here. There may well be other problems that I have overlooked; please point them out! There may well be better, or at least additional, ideas for solutions that I haven't thought of; please present them!
If we're generally happy with nominal typing inside a module, it would be great to find a way to make it viable for multi-module scenarios as well -- maybe we can then finally arrive at a type system that we can reach consensus on?

To give credit where it's due: thanks to @tebbi, @rossberg, and @RossTate for having discussed these ideas with me. This post is more thought-through than it would have been without their input :-)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions