Skip to content

Type Unification #7772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 19, 2025
Merged

Conversation

jaredramirez
Copy link
Collaborator

@jaredramirez jaredramirez commented May 1, 2025

Type unification

I started this after meeting with Richard It's based on rough plan we discussed there, unification in elm's compiler, and the rust roc compiler.

This implements Hindley-Milner style type unification with extensible records, extensible tag unions, effectful functions, and more!

Overview

It implements type unification for:

  • Flex vars
  • Rigid vars
  • Built-in types (special cased; Str, List, Box)
  • Numbers (special cased)
  • Nominal user-defined types (custom_type)
  • Tuples
  • Functions
  • Function effects (effectful/pure)
  • Aliases
  • Extensible records
  • Extensible tag unions

Differences with Rust Compiler

Also, the zig version does not implement abilities, lambda set variables, or recursion variables since those are either not supported in the new compiler or will work differently and do not need to be handled at the type level.

Overall the rust compiler's unification code is more complex than this Zig version (mainly due to things like recursion vars & lambda sets). As such, this implementation generally errs on the simpler side (basically doing whatever the elm compiler does) when I was in doubt about how something should work.

There are some notable differences compared to the rust implementation:

  1. In the rust compiler, tuples have an extensible variable to unwrap tuple types. Current zig implementation does not, and instead simply uses a list of tuple Vars for the elements.
  2. In the zig version, extensible tag unions are almost exactly the same as extensible records. In the rust version, there's more/different stuff going on there. I haven't fully groked why the additional complexity in the rust version is necessary, so it's possible this will change in the future
  3. Numbers and builtins are special-cased instead of being represented by type application and phantom types

Place in compiler architecture

Additionally, I've hooked up the unify code into the main file tree and setup the types store init with a reference to ModuleEnv. Then, all allocations that the types store does uses the allocator from ModuleEnv. I'm not sure what the broader compiler arch here is – I think Luke mentioned on zulip at one point that the type store may live on ModuleEnv, so happy to change it to that if that is the plan!

Review

First off, I'm sorry this PR is so large. It started as a small proof of concept, and just grew and grew. To help orient, here's the rough lay of the land:

  1. src/types/types.zig: This file contains all of the types that exist in the system, as well as type variables, descriptors and other metadata (such as rank & mark)
  2. src/types/store.zig: This is the types store (sometimes called unification table or union find). It's basically a mapping of Var -> Descriptor. There will be 1 types store per module. This store is modified during unification as types are unified
  3. src/check/check_types/unify.zig: This file the the unification logic it's self, which is the lion-share of the work here. About the first 1500 lines are the implementation, then the other 2000 are tests 😵‍💫
  4. src/check/canonicalize/IR.zig: This file is updated to reference the new number types in types/types.zig instead of the old types/num.zig types
  5. The remaining files are small things that needed to be updated due to replace the existing placeholder logic with this implementation

Any and all feedback is welcome! This is far and a way the most zig I've ever written, so please let me know where things can be done better.

Open Questions

  1. In the Can IR, there are references to precision_var that I've left, but I think they can be removed since numbers with variable precision do not use a full-fledged Var to represent the precision.
  2. Function's effectful-ness is controlled by a Var that's unified separately because this is how the rust compiler work, but I think it could be managed via an enum similar to numbers. I would like to make this change, but I'm unsure if there are down-stream type implications.

@jaredramirez jaredramirez force-pushed the jared/simple-unification branch 9 times, most recently from ccde7bb to 0cc9a3e Compare May 8, 2025 23:21
Copy link
Contributor

@rtfeldman rtfeldman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking EXCELLENT!!! 😍

@roc-lang roc-lang deleted a comment May 10, 2025
@roc-lang roc-lang deleted a comment May 10, 2025
@roc-lang roc-lang deleted a comment May 10, 2025
@roc-lang roc-lang deleted a comment May 10, 2025
@roc-lang roc-lang deleted a comment May 10, 2025
@roc-lang roc-lang deleted a comment May 10, 2025
@jaredramirez jaredramirez force-pushed the jared/simple-unification branch 9 times, most recently from b1ea47e to 7ddd15f Compare May 12, 2025 23:46
@jaredramirez jaredramirez marked this pull request as ready for review May 12, 2025 23:53
@jaredramirez jaredramirez force-pushed the jared/simple-unification branch from 7ddd15f to 1f6b596 Compare May 13, 2025 00:00
@jaredramirez jaredramirez enabled auto-merge May 13, 2025 19:46
@jaredramirez jaredramirez disabled auto-merge May 13, 2025 22:56
@jaredramirez jaredramirez enabled auto-merge (squash) May 13, 2025 22:56
@jaredramirez jaredramirez force-pushed the jared/simple-unification branch 6 times, most recently from 7ac8aac to 57bdbc0 Compare May 14, 2025 18:51
rtfeldman
rtfeldman previously approved these changes May 19, 2025
Copy link
Contributor

@rtfeldman rtfeldman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work on this @jaredramirez!!! 😍 😍 😍

I left some comments, but feel free to merge as-is and address in a separate PR if you prefer! This is awesome!

num_var: TypeIdx,
precision_var: TypeIdx,
num_var: TypeVar,
precision_var: TypeVar, // <- can probably be removed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, why would we remove this? I think we still want it (for int, float, and single quote), for the same reasons we currently do!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rtfeldman Since we aren't representing a number's precision using phantom types anymore, it's unclear to me what this TypeVar would be pointing to! Given our def of Num:

pub const Num = union(enum) {
    const Self = @This();

    flex_var: ?Ident.Idx,
    rigid_var: Ident.Idx,
    int: Int,
    frac: Frac,

    /// the Frac data type
    pub const Frac = union(enum) {
        flex_var: ?Ident.Idx,
        rigid_var: Ident.Idx,
        exact: Precision,

        /// the precision of a frac
        pub const Precision = enum { f32, f64, dec };
    };

    /// the Int data type
    pub const Int = union(enum) {
        flex_var: ?Ident.Idx,
        rigid_var: Ident.Idx,
        exact: Precision,

        /// the precision of an int
        pub const Precision = enum { u8, i8, u16, i16, u32, i32, u64, i64, u128, i128 };
    };
};

I don't think we need a precision_var anymore, since the precision is captured in the num_var? Very possible I'm missing something though.

pub const Num = union(enum) {
const Self = @This();

flex_var,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to flex_var, I this this will also need a rigid_var: ... (whatever type we want to use for the interned string name of the rigid var), because you can have Num(a) where a appears elsewhere in the type, e.g. in plus : Num(a), Num(a) -> Num(a).

@jaredramirez jaredramirez force-pushed the jared/simple-unification branch from 57bdbc0 to f32d3da Compare May 19, 2025 23:10
@jaredramirez jaredramirez disabled auto-merge May 19, 2025 23:10
@jaredramirez jaredramirez self-assigned this May 19, 2025
@jaredramirez jaredramirez enabled auto-merge (squash) May 19, 2025 23:25
@jaredramirez jaredramirez merged commit dbea908 into roc-lang:main May 19, 2025
20 checks passed
@jaredramirez jaredramirez deleted the jared/simple-unification branch May 19, 2025 23:45
fn fresh(self: *Self, vars: *const ResolvedVarDescs, new_content: Content) Var {
const var_ = self.types_store.register(.{
.content = new_content,
.rank = Rank.min(vars.a.desc.rank, vars.b.desc.rank),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this should be the maximum rank in the entire current pool - but vars only includes the variables presently unified, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vars.a and vars.b are the two types currently being unified, then scratch.fresh_vars are all intermediate variables created in this pool.

I used min here because that's what Elm and the rust compiler do. But if it should be max, lmk and I'm happy to change it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants