Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove global Namespace arena in favour of structural sharing. Refactor to use absolute paths. #1213

Merged
merged 35 commits into from
May 10, 2022

Conversation

mitchmindtree
Copy link
Contributor

@mitchmindtree mitchmindtree commented Apr 11, 2022

Closes #682

Part 1. Gobal State Structural Sharing

This removes the global arena used for managing namespaces during type
checking in favour of reverting to the original approach of representing
new scopes by cloning the namespace.

Prior to the introduction of the global arena, cloning the namespace to
represent new scopes caused exponential memory growth as the entirety of
each of the inner collections was cloned along with the namespace
itself.

In this new approach, we aim to avoid this by using clone-friendly,
drop-in-replacement collections from the im crate that enable
"structural sharing" of the Namespace fields. See TODO below for
whether this is done yet or not.

You can read more about structural sharing here. The gist is that
rather than deep cloning the entire structure, we clone a reference to
the original state, and new allocations only occur upon modifying the
data structure with only the memory needed to record the difference.

The result is that we can avoid the side-effecting nature of maintaining
global state, making it easier to reason about how Namespaces are
accessed and updated within functions by reading their signatures.

It should also allow us to lean on RAII for cleaning up the namespace
contents, solving the bug where we are currently unable to compile the
same project more than once within the same process as mentioned
here. This is particularly relevant to sway-lsp where we will
likely want to lean on sway-core to type check the same code many
times within one process.

This also gets us a step toward achieving referential transparency for a
lot of our type checking functions which may enable easier caching and
parallelism in the future, though this would still currently be blocked
by the type engine's global state.

Many of the newer methods previously provided by NamespaceWrapper and
NamespaceRef have been moved to Namespace itself.

Part 2. Absolute Paths

The final set of changes in this PR are a refactor of the Namespace type in
order to ensure access to the whole module hierarchy (as well as the init
namespace) during all of type-checking.

Splitting Namespace into Root, Module and Items.

Previously, we used a single Namespace structure to represent not only the
module tree but the items within each module too.

This PR refactors the original Namespace type into:

  • Module: Represents a single module. Stores the set of items available within
    the module as well as the set of submodules.
  • Items: The set of declarations, implementations, synonyms and aliases
    available within a particular scope.

This allows for moving module-level behaviour onto the Module type, and
item-level behaviour to the Items type. This is particularly useful in
type-checking for distinguishing between whether or not we need the entire
module tree or just the set of items for a particular module (e.g. the
monomorphize methods).

The new Root type is a wrapper around the project's root Module. Methods
that should only be called on the root module (e.g. those that involve resolving
absolute paths, or internally resolving synonyms) have been moved onto this
type.

Module derefs to its inner Items, and Root derefs to its inner Module.
This is useful for retaining the ergonomics of the old Namespace type while
gaining the separation-of-concerns benefits of the new type separation in
function signatures.

Adding a new Namespace contextual type

A new higher-level Namespace type has been added that acts as a context,
making it easier to enter/exit submodules and scopes, and provides some
short-hand methods to avoid the need for repeatedly indexing into root with
the current module path.

The Namespace derefs to the Module that is currently being checked, allowing
to minimize the changes throughout the rest of type-checking.

TODO

  • Replace collections in Namespace fields with clone-friendly
    alternatives provided by im.
  • Solve E2E errors. Since the arena was introduced, a lot of new
    global mutation has also been introduced disguised by the RwLock.
    E.g. the monomorphize functions now mutate the namespace at the
    given reference, meaning that ns.resolve_self_* is also
    self-mutating. Need to review how these methods are used to ensure the
    new behaviour achieves the same intended results.

@adlerjohn adlerjohn added compiler General compiler. Should eventually become more specific as the issue is triaged code quality labels Apr 11, 2022
@mitchmindtree mitchmindtree force-pushed the mitchmindtree/unglobal-namespace branch from 8d5c2fb to f951807 Compare April 11, 2022 06:57
@mitchmindtree mitchmindtree force-pushed the mitchmindtree/unglobal-namespace branch from 9587cb6 to d18d0bb Compare April 16, 2022 01:42
mitchmindtree added a commit that referenced this pull request Apr 18, 2022
I noticed these when they resulted in compilation errors while working
on the `Namespace` refactor in #1213.

It appears that on `master` we're leaking submodules into the root
namespace, and as a result these haven't been caught yet. #1213 appears
to fix this and results in a compilation error. I might add a
`should_fail` E2E test as a part of #1213 to ensure we don't
accidentally leak submodules into the root in the future.
mitchmindtree added a commit that referenced this pull request Apr 18, 2022
I noticed these when they resulted in compilation errors while working
on the `Namespace` refactor in #1213.

It appears that on `master` we're leaking submodules into the root
namespace, and as a result these haven't been caught yet. #1213 appears
to fix this and results in a compilation error. I might add a
`should_fail` E2E test as a part of #1213 to ensure we don't
accidentally leak submodules into the root in the future.
@mitchmindtree
Copy link
Contributor Author

This is just about working! Almost all E2E tests pass besides:

  • should_pass/language/dependencies and
  • should_pass/language/multi_item_import.

The Problem

The reason why these don't pass is because while these changes do fix the issue where we could leak submodules into the crate root, they've also broken absolute paths.

In my refactor, I've made the incorrect assumption that the crate_namespace always referred to the initial state of the root of the namespace, i.e. containing library dependencies and perhaps one day the prelude items. However, it appears the intention for it is to simply act as an index into the root of the namespace (which will require mutation as modules and items are added to the root) so that absolute paths may be used to access items from within submodules.

Potential Solution

In order to address this, I think the correct approach is to pass around a &mut Namespace that always refers to the root namespace, along with a Vec<Ident> that represents the absolute path of the current scope or module.

More specifically my current plan is to remove the existing namespace: &mut Namespace and crate_namespace: &Namespace fields from the TypeCheckArguments, and replace them with:

    /// The root namespace containing all items and modules.
    root: &mut Namespace,
    /// An absolute path from the `root` to the namespace of the current scope.
    scope: Vec<Ident>,

We can use the scope to index into the namespace at the current scope as necessary, while retaining unique access to the root in case we visit an absolute path.

@mitchmindtree mitchmindtree force-pushed the mitchmindtree/unglobal-namespace branch from 2001d61 to 0a8918f Compare April 18, 2022 12:46
@mitchmindtree mitchmindtree force-pushed the mitchmindtree/unglobal-namespace branch from 0a8918f to f5d56b5 Compare April 24, 2022 05:38
@mitchmindtree mitchmindtree force-pushed the mitchmindtree/unglobal-namespace branch 5 times, most recently from 68acbfe to b8c3294 Compare May 7, 2022 04:04
@mitchmindtree mitchmindtree marked this pull request as ready for review May 7, 2022 05:24
@mitchmindtree mitchmindtree changed the title WIP: Remove global Namespace arena in favour of structural sharing Remove global Namespace arena in favour of structural sharing. Refactor to use absolute paths. May 7, 2022
@mitchmindtree
Copy link
Contributor Author

OK this should finally be good to go! I've updated the top-level comment to add a summary of the new "absolute path" changes as a second part, hopefully it helps to navigate this a little!

Closes #682

This removes the global arena used for managing namespaces during type
checking in favour of reverting to the original approach of representing
new scopes by cloning the namespace.

Prior to the introduction of the global arena, cloning the namespace to
represent new scopes caused exponential memory growth as the entirety of
each of the inner collections was cloned along with the namespace
itself.

In this new approach, we aim to avoid this by using clone-friendly,
drop-in-replacement collections from the `im` crate that enable
"structural sharing" of the `Namespace` fields. *See TODO below for
whether this is done yet or not.*

You can read more about structural sharing [here][1]. The gist is that
rather than deep cloning the entire structure, we clone a reference to
the original state, and new allocations only occur upon *modifying* the
data structure with only the memory needed to record the difference.

The result is that we can avoid the side-effecting nature of maintaining
global state, making it easier to reason about how `Namespace`s are
accessed and updated within functions simply by reading their
signatures.

It should also allow us to lean on RAII for cleaning up the namespace
contents, solving the bug where we are currently unable to compile the
same project more than once within the same process as mentioned
[here][2]. This is particularly relevant to `sway-lsp` where we will
likely want to lean on `sway-core` to type check the same code many
times within one process.

This also gets us a step toward achieving referential transparency for a
lot of our type checking functions which may enable easier caching and
parallelism in the future, though this would still currently be blocked
by the type engine's global state.

Many of the newer methods previously provided by `NamespaceWrapper` and
`NamespaceRef` have been moved to `Namespace` itself.

[1]: https://docs.rs/im/latest/im/#what-are-immutable-data-structures
[2]: #682 (comment)

TODO
----

- [ ] Solve E2E errors. Since the arena was introduced, a lot of new
  global mutation has also been introduced disguised by the `RwLock`.
  E.g. the `monomorphize` functions now mutate the namespace at the
  given reference, meaning that `ns.resolve_self_*` is also
  self-mutating. Need to review how these methods are used to ensure the
  new behaviour achieves the same intended results.
- [ ] Replace collections in `Namespace` fields with clone-friendly
  alternatives provided by `im`.
mitchmindtree and others added 21 commits May 8, 2022 00:53
By making `Namespace` `PartialEq`, we're able to re-add the original
filter to the `optimize` pass that ensures we don't unnecessarily
re-compile constants for the current module.
This enables proper support for absolute paths more generally, and
avoids the need for specifying an optional `from_module` for much of
namespace's API.
Renames `get_symbol` and `get_call_path` to `resolve_symbol` and
`resolve_call_path` respectively in order to distinguish between
direct lookup/indexing (what `get` is commonly used for) and resolving
the declaration for a given symbol identifier.

Refactors the `get_call_path` method in order to better handle relative
modules. Previously, the method assumed that none of the `CallPath`
prefixes refer to imported symbols, implying that Sway is unable to
support submodule imports. This change opens the path to supporting
submodule imports.

Refactors the `get_symbol` method in order to follow symbol resolution
through more than one import. This should allow for the importing of
re-exported symbols.
This better represents the behaviour where the new `root` instance is
scoped to the particular context (e.g. function body, code block, etc)
and hence has no need to update the outer body.
This refactors the `init`, `root` and `mod_path` namespace items into a
single `Namespace` context type. This simplifies passing these fields
through `TypeCheckArguments` and the rest of type-checking and abstracts
away some of the complexity involved in interactions between each of
these items.

For the most part, the new `Namespace` methods are just short-hand for
using `root` and `mod_path` together, however there are a couple of
significant additions:

- `Namespace::enter_submodule` returns a session type with a namespace
  associated with the submodule being type-checked. Upon being dropped,
  the `SubmoduleNamespace` returns control to the current module and
  resets the associated `mod_path`.
- `Namespace::enter_scope` simply clones each of the fields, but better
  clarifies intentions in code.
Originally I inlined this into the `import_new_file` as it required some
refactoring following changes to the `Namespace`, and was the only
use site anyway. Thinking on this more, it makes sense to abstract this
step to the root as currently, it involves both parsing and
type-checking, not just type-checking.
@mitchmindtree mitchmindtree force-pushed the mitchmindtree/unglobal-namespace branch from a8ce775 to e3ce882 Compare May 7, 2022 14:59
@mohammadfawaz
Copy link
Contributor

I'm excited to learn about this change!

Haven't looked at the code yet but I just wanted to express my appreciation to your elaborate description for this PR. It will be extremely helpful for all of us working on the compiler 🙌

Copy link
Contributor

@otrho otrho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤘

Copy link
Contributor

@sezna sezna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@mitchmindtree mitchmindtree merged commit 62e8fde into master May 10, 2022
@mitchmindtree mitchmindtree deleted the mitchmindtree/unglobal-namespace branch May 10, 2022 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code quality compiler General compiler. Should eventually become more specific as the issue is triaged
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Refactor RwLock used in namespaces to be a concurrent slab or safer data structure
5 participants