Skip to content

The type system of AST node macro types #11981

Open
@HertzDevil

Description

@HertzDevil

#is_a? in the macro language is implemented as a string check:

def visit(node : IsA)
node.obj.accept self
const_name = node.const.to_s
@last = BoolLiteral.new(@last.class_desc_is_a?(const_name))
false
end

Crystal::ASTNode#class_desc_is_a? roughly does the following: it uses some macro magic to obtain the ancestors of the built-in AST node types, extracts their #class_desc (i.e. #class_name in the macro language), then checks for membership of the const_name argument. A Splat receiver, for example, translates this to const_name.in?("Splat", "UnaryExpression", "ASTNode"). This means #is_a? is not actually backed by a type system within the macro interpreter. It also means the following fails, because no AST nodes ever have a #class_name containing spaces:

{% 1.is_a?(NumberLiteral | BoolLiteral) %} # => false

A type system is needed to assign meaning to those type expressions, so that we can reliably use the full set of macro types in #is_a? and other places that (may eventually) accept macro type expressions, such as the restrictions in built-in macro method docs, macros, a different kind of macros, and annotations. The following is an attempt to establish this type system:

  • ASTNode is the root of all macro types. It is also the top type; all other macro types are strict subtypes of ASTNode. (In contrast, Object is not truly a top type in the regular language.)
  • Classes and inheritance are defined in the same way as the regular language. All macro types, including ASTNode, are classes.
    • Abstract classes are expository only; a macro type T is abstract if no AST node can be constructed that belongs to T but not to any strict subtype of T.
    • All types are implicitly "virtual"; there is no distinction between ASTNode and what one would call ASTNode+.
  • Generics introduce the same subtyping relationships as the regular language; generic instances are strict subtypes of their corresponding uninstantiated generic macro types.
    • Generics cannot be inherited from (or rather the macro language doesn't need this). Generic type arguments must be other macro types, never numbers nor expressions like sizeof.
    • ASTNode#class_name is always that of the uninstantiated generic. Both [1, 2, 3].class_name and %w(a b c).class_name will continue to be "ArrayLiteral" to avoid breakage.
    • ArrayLiteral(T) denotes an array literal whose element nodes all belong to subtypes of T. The documentation doesn't declare this type as such, but a great deal of return value restrictions try to instantiate ArrayLiteral already. ArrayLiteral is covariant in T. A part of its hierarchy would look like this:
      graph BT
      NumberLiteral --> ASTNode
      ArrayLiteral --> ASTNode
      a_astnode["ArrayLiteral(ASTNode)"] --> ArrayLiteral
      a_numberliteral["ArrayLiteral(NumberLiteral)"] --> a_astnode
      a_a["ArrayLiteral(ArrayLiteral)"] --> a_astnode
      a_a_astnode["ArrayLiteral(ArrayLiteral(ASTNode))"] --> a_a
      etc["..."] --> a_a_astnode
      a_n["ArrayLiteral(NoReturn)"] --> a_numberliteral
      a_n --> etc
      NoReturn --> NumberLiteral
      NoReturn --> a_n
      
      Loading
      It follows that all array literals belong to ArrayLiteral(ASTNode), and all empty ones belong to ArrayLiteral(NoReturn).
    • We could define HashLiteral(K, V), TupleLiteral(T), and NamedTupleLiteral(V) in a similar fashion, although the docs don't use these notations yet. All these type variables are also covariant.
    • As seen above, all types can be used as generic type arguments, unlike the regular language.
  • | is the union operator; the resulting macro type is the smallest set-theoretic union of the operand macro types. It behaves slightly differently from unions in the regular language:
    • Two types in the same union merge if either is a subtype of the other.
    • Two types do not merge if they are in the same class hierarchy but neither is a subtype of the other. NumberLiteral | BoolLiteral is not reduced to ASTNode.
    • The order of variant types within a union macro type is undefined, not even for NilLiteral and Nop.
    • Subtyping between unions is defined like the regular language; A | B is a subtype of C | D if and only if (A <= C || A <= D) && (B <= C || B <= D).
    • There are no sealed semantics; And | Or is a strict subtype of BinaryOp, even though the former exhausts the latter.
  • NoReturn is the empty union of macro types. It is the bottom type; no nodes have this type, and all other macro types are strict supertypes of it. (We need NoReturn because ::raise has this return value restriction.)
  • There are no metaclasses of macro types; the introduction of a type system does not imply the macro language needs to support introspection of this macro type hierarchy. Metaclasses tied to non-macro types might be introduced on top of TypeNode as a result of [RFC] Macro Defs - Methods in Macro land #8835.
  • There are no modules, aliases, typedefs, or autocast types. There are no namespaces other than the top-level one.
  • self, typeof, _, the single splat operator, and the short-hand notations ?, *, [N], {}, -> are all unsupported. (Perhaps self could work in some contexts, but the docs don't use it.)
  • If a constant name does not correspond to a macro type, for compatibility it is treated as NoReturn. This means {{ 1.is_a?(Int32) }} is false, but {{ ([] of Nil).is_a?(Array(Int32)) }} is true.

The description above is mostly complete; it leaves out a definition of the intersection operator, which would be needed by macro type restrictions. #is_a? does not need it because the macro language is interpreted and does not rely on type filters. Once we construct this type system, we could refactor #is_a? into:

def visit(node : IsA)
  node.obj.accept self

  # `NumberLiteral`, `ArrayLiteral(Def)` etc.
  macro_type = lookup_macro_type(node.const)

  # `ASTNode#macro_is_a?` is the membership check for macro types
  # we avoid doing something like `@last.macro_type.implements?(macro_type)`
  # so that in case of e.g. an `ArrayLiteral` we don't have to build
  # the full union of element types
  @last = BoolLiteral.new(@last.macro_is_a?(macro_type))

  false
end

and lookup_macro_type would be readily usable when we need to support macro types outside #is_a?.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions