Remove global `Namespace` arena in favour of structural sharing. Refactor to use absolute paths. #1213

mitchmindtree · 2022-04-11T04:45:17Z

Closes #682

Part 1. Gobal State Structural Sharing

This removes the global arena used for managing namespaces during type
checking in favour of reverting to the original approach of representing
new scopes by cloning the namespace.

Prior to the introduction of the global arena, cloning the namespace to
represent new scopes caused exponential memory growth as the entirety of
each of the inner collections was cloned along with the namespace
itself.

In this new approach, we aim to avoid this by using clone-friendly,
drop-in-replacement collections from the im crate that enable
"structural sharing" of the Namespace fields. See TODO below for
whether this is done yet or not.

You can read more about structural sharing here. The gist is that
rather than deep cloning the entire structure, we clone a reference to
the original state, and new allocations only occur upon modifying the
data structure with only the memory needed to record the difference.

The result is that we can avoid the side-effecting nature of maintaining
global state, making it easier to reason about how Namespaces are
accessed and updated within functions by reading their signatures.

It should also allow us to lean on RAII for cleaning up the namespace
contents, solving the bug where we are currently unable to compile the
same project more than once within the same process as mentioned
here. This is particularly relevant to sway-lsp where we will
likely want to lean on sway-core to type check the same code many
times within one process.

This also gets us a step toward achieving referential transparency for a
lot of our type checking functions which may enable easier caching and
parallelism in the future, though this would still currently be blocked
by the type engine's global state.

Many of the newer methods previously provided by NamespaceWrapper and
NamespaceRef have been moved to Namespace itself.

Part 2. Absolute Paths

The final set of changes in this PR are a refactor of the Namespace type in
order to ensure access to the whole module hierarchy (as well as the init
namespace) during all of type-checking.

Splitting `Namespace` into `Root`, `Module` and `Items`.

Previously, we used a single Namespace structure to represent not only the
module tree but the items within each module too.

This PR refactors the original Namespace type into:

Module: Represents a single module. Stores the set of items available within
the module as well as the set of submodules.
Items: The set of declarations, implementations, synonyms and aliases
available within a particular scope.

This allows for moving module-level behaviour onto the Module type, and
item-level behaviour to the Items type. This is particularly useful in
type-checking for distinguishing between whether or not we need the entire
module tree or just the set of items for a particular module (e.g. the
monomorphize methods).

The new Root type is a wrapper around the project's root Module. Methods
that should only be called on the root module (e.g. those that involve resolving
absolute paths, or internally resolving synonyms) have been moved onto this
type.

Module derefs to its inner Items, and Root derefs to its inner Module.
This is useful for retaining the ergonomics of the old Namespace type while
gaining the separation-of-concerns benefits of the new type separation in
function signatures.

Adding a new `Namespace` contextual type

A new higher-level Namespace type has been added that acts as a context,
making it easier to enter/exit submodules and scopes, and provides some
short-hand methods to avoid the need for repeatedly indexing into root with
the current module path.

The Namespace derefs to the Module that is currently being checked, allowing
to minimize the changes throughout the rest of type-checking.

TODO

Replace collections in Namespace fields with clone-friendly
alternatives provided by im.
Solve E2E errors. Since the arena was introduced, a lot of new
global mutation has also been introduced disguised by the RwLock.
E.g. the monomorphize functions now mutate the namespace at the
given reference, meaning that ns.resolve_self_* is also
self-mutating. Need to review how these methods are used to ensure the
new behaviour achieves the same intended results.

I noticed these when they resulted in compilation errors while working on the `Namespace` refactor in #1213. It appears that on `master` we're leaking submodules into the root namespace, and as a result these haven't been caught yet. #1213 appears to fix this and results in a compilation error. I might add a `should_fail` E2E test as a part of #1213 to ensure we don't accidentally leak submodules into the root in the future.

mitchmindtree · 2022-04-18T12:41:25Z

This is just about working! Almost all E2E tests pass besides:

should_pass/language/dependencies and
should_pass/language/multi_item_import.

The Problem

The reason why these don't pass is because while these changes do fix the issue where we could leak submodules into the crate root, they've also broken absolute paths.

In my refactor, I've made the incorrect assumption that the crate_namespace always referred to the initial state of the root of the namespace, i.e. containing library dependencies and perhaps one day the prelude items. However, it appears the intention for it is to simply act as an index into the root of the namespace (which will require mutation as modules and items are added to the root) so that absolute paths may be used to access items from within submodules.

Potential Solution

In order to address this, I think the correct approach is to pass around a &mut Namespace that always refers to the root namespace, along with a Vec<Ident> that represents the absolute path of the current scope or module.

More specifically my current plan is to remove the existing namespace: &mut Namespace and crate_namespace: &Namespace fields from the TypeCheckArguments, and replace them with:

    /// The root namespace containing all items and modules.
    root: &mut Namespace,
    /// An absolute path from the `root` to the namespace of the current scope.
    scope: Vec<Ident>,

We can use the scope to index into the namespace at the current scope as necessary, while retaining unique access to the root in case we visit an absolute path.

mitchmindtree · 2022-05-07T05:30:51Z

OK this should finally be good to go! I've updated the top-level comment to add a summary of the new "absolute path" changes as a second part, hopefully it helps to navigate this a little!

Closes #682 This removes the global arena used for managing namespaces during type checking in favour of reverting to the original approach of representing new scopes by cloning the namespace. Prior to the introduction of the global arena, cloning the namespace to represent new scopes caused exponential memory growth as the entirety of each of the inner collections was cloned along with the namespace itself. In this new approach, we aim to avoid this by using clone-friendly, drop-in-replacement collections from the `im` crate that enable "structural sharing" of the `Namespace` fields. *See TODO below for whether this is done yet or not.* You can read more about structural sharing [here][1]. The gist is that rather than deep cloning the entire structure, we clone a reference to the original state, and new allocations only occur upon *modifying* the data structure with only the memory needed to record the difference. The result is that we can avoid the side-effecting nature of maintaining global state, making it easier to reason about how `Namespace`s are accessed and updated within functions simply by reading their signatures. It should also allow us to lean on RAII for cleaning up the namespace contents, solving the bug where we are currently unable to compile the same project more than once within the same process as mentioned [here][2]. This is particularly relevant to `sway-lsp` where we will likely want to lean on `sway-core` to type check the same code many times within one process. This also gets us a step toward achieving referential transparency for a lot of our type checking functions which may enable easier caching and parallelism in the future, though this would still currently be blocked by the type engine's global state. Many of the newer methods previously provided by `NamespaceWrapper` and `NamespaceRef` have been moved to `Namespace` itself. [1]: https://docs.rs/im/latest/im/#what-are-immutable-data-structures [2]: #682 (comment) TODO ---- - [ ] Solve E2E errors. Since the arena was introduced, a lot of new global mutation has also been introduced disguised by the `RwLock`. E.g. the `monomorphize` functions now mutate the namespace at the given reference, meaning that `ns.resolve_self_*` is also self-mutating. Need to review how these methods are used to ensure the new behaviour achieves the same intended results. - [ ] Replace collections in `Namespace` fields with clone-friendly alternatives provided by `im`.

By making `Namespace` `PartialEq`, we're able to re-add the original filter to the `optimize` pass that ensures we don't unnecessarily re-compile constants for the current module.

This enables proper support for absolute paths more generally, and avoids the need for specifying an optional `from_module` for much of namespace's API.

Renames `get_symbol` and `get_call_path` to `resolve_symbol` and `resolve_call_path` respectively in order to distinguish between direct lookup/indexing (what `get` is commonly used for) and resolving the declaration for a given symbol identifier. Refactors the `get_call_path` method in order to better handle relative modules. Previously, the method assumed that none of the `CallPath` prefixes refer to imported symbols, implying that Sway is unable to support submodule imports. This change opens the path to supporting submodule imports. Refactors the `get_symbol` method in order to follow symbol resolution through more than one import. This should allow for the importing of re-exported symbols.

This better represents the behaviour where the new `root` instance is scoped to the particular context (e.g. function body, code block, etc) and hence has no need to update the outer body.

This refactors the `init`, `root` and `mod_path` namespace items into a single `Namespace` context type. This simplifies passing these fields through `TypeCheckArguments` and the rest of type-checking and abstracts away some of the complexity involved in interactions between each of these items. For the most part, the new `Namespace` methods are just short-hand for using `root` and `mod_path` together, however there are a couple of significant additions: - `Namespace::enter_submodule` returns a session type with a namespace associated with the submodule being type-checked. Upon being dropped, the `SubmoduleNamespace` returns control to the current module and resets the associated `mod_path`. - `Namespace::enter_scope` simply clones each of the fields, but better clarifies intentions in code.

Originally I inlined this into the `import_new_file` as it required some refactoring following changes to the `Namespace`, and was the only use site anyway. Thinking on this more, it makes sense to abstract this step to the root as currently, it involves both parsing and type-checking, not just type-checking.

mohammadfawaz · 2022-05-07T18:37:19Z

I'm excited to learn about this change!

Haven't looked at the code yet but I just wanted to express my appreciation to your elaborate description for this PR. It will be extremely helpful for all of us working on the compiler 🙌

otrho

🤘

sezna

🚀

adlerjohn assigned mitchmindtree Apr 11, 2022

adlerjohn added compiler General compiler. Should eventually become more specific as the issue is triaged code quality labels Apr 11, 2022

mitchmindtree force-pushed the mitchmindtree/unglobal-namespace branch from 8d5c2fb to f951807 Compare April 11, 2022 06:57

mitchmindtree mentioned this pull request Apr 15, 2022

Consider refactoring monomorphizing into a separate step following type_check? #1267

Open

mitchmindtree force-pushed the mitchmindtree/unglobal-namespace branch from 9587cb6 to d18d0bb Compare April 16, 2022 01:42

mitchmindtree mentioned this pull request Apr 18, 2022

Fix import of submodules within std::context #1287

Merged

mitchmindtree force-pushed the mitchmindtree/unglobal-namespace branch from 2001d61 to 0a8918f Compare April 18, 2022 12:46

mitchmindtree mentioned this pull request Apr 19, 2022

Add warning for unused imports #1298

Open

mitchmindtree force-pushed the mitchmindtree/unglobal-namespace branch from 0a8918f to f5d56b5 Compare April 24, 2022 05:38

mitchmindtree force-pushed the mitchmindtree/unglobal-namespace branch 5 times, most recently from 68acbfe to b8c3294 Compare May 7, 2022 04:04

mitchmindtree marked this pull request as ready for review May 7, 2022 05:24

mitchmindtree requested review from sezna, otrho, canndrew, adlerjohn, emilyaherbert and mohammadfawaz May 7, 2022 05:25

mitchmindtree changed the title ~~WIP: Remove global Namespace arena in favour of structural sharing~~ Remove global Namespace arena in favour of structural sharing. Refactor to use absolute paths. May 7, 2022

mitchmindtree added 3 commits May 8, 2022 00:50

Update selector_debug feature for namespace changes

c0ad29d

Run cargo fmt on sway-core

9c316fe

mitchmindtree and others added 21 commits May 8, 2022 00:53

Refactor type_check_trait_methods to match old arena behaviour

a39175c

Fix incorrect temporary namespace in impl trait type check

5073252

Simplify splitting enum module and name in typed_expression

4815421

Make Namespace PartialEq. Add missing filter to optimize.

fc9f1e4

By making `Namespace` `PartialEq`, we're able to re-add the original filter to the `optimize` pass that ensures we don't unnecessarily re-compile constants for the current module.

Remove old namespace arena file

a9ce3dc

Use namespace with absolute paths, rather than relative namespaces

116cf26

This enables proper support for absolute paths more generally, and avoids the need for specifying an optional `from_module` for much of namespace's API.

Disable implicit-std for primitive_type_argument E2E test

e1fd4ed

Update tests and alt features for switch to absolute namespace paths

9b88a13

optimize: Fix incorrect filter when compiling constants

02fea84

Distinguish between Namespace, Module and Root with types

d4138bd

Remove unnecessary filter from optimize pass

831afcb

Add accidentally removed println from code_block module

f78ed6a

Rename temp_root bindings to scoped_root

f3ebc4b

This better represents the behaviour where the new `root` instance is scoped to the particular context (e.g. function body, code block, etc) and hence has no need to update the outer body.

Use clearer name for enum module path in instantiate_enum fn

28878cf

Update comment in forc-pkg from removal of NamespaceRef

ee4eb65

Remove rogue debugging println in namespace.rs

57a6e8e

Run cargo clippy/fmt after rebasing onto Span/String/Ident fixup work

98d6569

Remove small tweaks/changes unrelated to the goal of this PR

e3ce882

mitchmindtree force-pushed the mitchmindtree/unglobal-namespace branch from a8ce775 to e3ce882 Compare May 7, 2022 14:59

mitchmindtree mentioned this pull request May 9, 2022

Add a cargo feature to panic! early on compile errors for easier sway-core debugging? #1502

Open

otrho approved these changes May 9, 2022

View reviewed changes

sezna approved these changes May 9, 2022

View reviewed changes

mitchmindtree merged commit 62e8fde into master May 10, 2022

mitchmindtree deleted the mitchmindtree/unglobal-namespace branch May 10, 2022 00:41

This was referenced May 12, 2022

Windows Support (tracking issue) #1526

Open

Rename crate_namespace bindings to root_namespace or init_namespace #1268

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove global `Namespace` arena in favour of structural sharing. Refactor to use absolute paths. #1213

Remove global `Namespace` arena in favour of structural sharing. Refactor to use absolute paths. #1213

mitchmindtree commented Apr 11, 2022 •

edited

Loading

mitchmindtree commented Apr 18, 2022

mitchmindtree commented May 7, 2022

mohammadfawaz commented May 7, 2022

otrho left a comment

sezna left a comment

Remove global Namespace arena in favour of structural sharing. Refactor to use absolute paths. #1213

Remove global Namespace arena in favour of structural sharing. Refactor to use absolute paths. #1213

Conversation

mitchmindtree commented Apr 11, 2022 • edited Loading

Part 1. Gobal State Structural Sharing

Part 2. Absolute Paths

Splitting Namespace into Root, Module and Items.

Adding a new Namespace contextual type

TODO

mitchmindtree commented Apr 18, 2022

The Problem

Potential Solution

mitchmindtree commented May 7, 2022

mohammadfawaz commented May 7, 2022

otrho left a comment

Choose a reason for hiding this comment

sezna left a comment

Choose a reason for hiding this comment

Remove global `Namespace` arena in favour of structural sharing. Refactor to use absolute paths. #1213

Remove global `Namespace` arena in favour of structural sharing. Refactor to use absolute paths. #1213

mitchmindtree commented Apr 11, 2022 •

edited

Loading

Splitting `Namespace` into `Root`, `Module` and `Items`.

Adding a new `Namespace` contextual type