-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove global Namespace
arena in favour of structural sharing. Refactor to use absolute paths.
#1213
Conversation
8d5c2fb
to
f951807
Compare
9587cb6
to
d18d0bb
Compare
I noticed these when they resulted in compilation errors while working on the `Namespace` refactor in #1213. It appears that on `master` we're leaking submodules into the root namespace, and as a result these haven't been caught yet. #1213 appears to fix this and results in a compilation error. I might add a `should_fail` E2E test as a part of #1213 to ensure we don't accidentally leak submodules into the root in the future.
I noticed these when they resulted in compilation errors while working on the `Namespace` refactor in #1213. It appears that on `master` we're leaking submodules into the root namespace, and as a result these haven't been caught yet. #1213 appears to fix this and results in a compilation error. I might add a `should_fail` E2E test as a part of #1213 to ensure we don't accidentally leak submodules into the root in the future.
This is just about working! Almost all E2E tests pass besides:
The ProblemThe reason why these don't pass is because while these changes do fix the issue where we could leak submodules into the crate root, they've also broken absolute paths. In my refactor, I've made the incorrect assumption that the Potential SolutionIn order to address this, I think the correct approach is to pass around a More specifically my current plan is to remove the existing /// The root namespace containing all items and modules.
root: &mut Namespace,
/// An absolute path from the `root` to the namespace of the current scope.
scope: Vec<Ident>, We can use the |
2001d61
to
0a8918f
Compare
0a8918f
to
f5d56b5
Compare
68acbfe
to
b8c3294
Compare
Namespace
arena in favour of structural sharingNamespace
arena in favour of structural sharing. Refactor to use absolute paths.
OK this should finally be good to go! I've updated the top-level comment to add a summary of the new "absolute path" changes as a second part, hopefully it helps to navigate this a little! |
Closes #682 This removes the global arena used for managing namespaces during type checking in favour of reverting to the original approach of representing new scopes by cloning the namespace. Prior to the introduction of the global arena, cloning the namespace to represent new scopes caused exponential memory growth as the entirety of each of the inner collections was cloned along with the namespace itself. In this new approach, we aim to avoid this by using clone-friendly, drop-in-replacement collections from the `im` crate that enable "structural sharing" of the `Namespace` fields. *See TODO below for whether this is done yet or not.* You can read more about structural sharing [here][1]. The gist is that rather than deep cloning the entire structure, we clone a reference to the original state, and new allocations only occur upon *modifying* the data structure with only the memory needed to record the difference. The result is that we can avoid the side-effecting nature of maintaining global state, making it easier to reason about how `Namespace`s are accessed and updated within functions simply by reading their signatures. It should also allow us to lean on RAII for cleaning up the namespace contents, solving the bug where we are currently unable to compile the same project more than once within the same process as mentioned [here][2]. This is particularly relevant to `sway-lsp` where we will likely want to lean on `sway-core` to type check the same code many times within one process. This also gets us a step toward achieving referential transparency for a lot of our type checking functions which may enable easier caching and parallelism in the future, though this would still currently be blocked by the type engine's global state. Many of the newer methods previously provided by `NamespaceWrapper` and `NamespaceRef` have been moved to `Namespace` itself. [1]: https://docs.rs/im/latest/im/#what-are-immutable-data-structures [2]: #682 (comment) TODO ---- - [ ] Solve E2E errors. Since the arena was introduced, a lot of new global mutation has also been introduced disguised by the `RwLock`. E.g. the `monomorphize` functions now mutate the namespace at the given reference, meaning that `ns.resolve_self_*` is also self-mutating. Need to review how these methods are used to ensure the new behaviour achieves the same intended results. - [ ] Replace collections in `Namespace` fields with clone-friendly alternatives provided by `im`.
By making `Namespace` `PartialEq`, we're able to re-add the original filter to the `optimize` pass that ensures we don't unnecessarily re-compile constants for the current module.
This enables proper support for absolute paths more generally, and avoids the need for specifying an optional `from_module` for much of namespace's API.
Renames `get_symbol` and `get_call_path` to `resolve_symbol` and `resolve_call_path` respectively in order to distinguish between direct lookup/indexing (what `get` is commonly used for) and resolving the declaration for a given symbol identifier. Refactors the `get_call_path` method in order to better handle relative modules. Previously, the method assumed that none of the `CallPath` prefixes refer to imported symbols, implying that Sway is unable to support submodule imports. This change opens the path to supporting submodule imports. Refactors the `get_symbol` method in order to follow symbol resolution through more than one import. This should allow for the importing of re-exported symbols.
This better represents the behaviour where the new `root` instance is scoped to the particular context (e.g. function body, code block, etc) and hence has no need to update the outer body.
This refactors the `init`, `root` and `mod_path` namespace items into a single `Namespace` context type. This simplifies passing these fields through `TypeCheckArguments` and the rest of type-checking and abstracts away some of the complexity involved in interactions between each of these items. For the most part, the new `Namespace` methods are just short-hand for using `root` and `mod_path` together, however there are a couple of significant additions: - `Namespace::enter_submodule` returns a session type with a namespace associated with the submodule being type-checked. Upon being dropped, the `SubmoduleNamespace` returns control to the current module and resets the associated `mod_path`. - `Namespace::enter_scope` simply clones each of the fields, but better clarifies intentions in code.
Originally I inlined this into the `import_new_file` as it required some refactoring following changes to the `Namespace`, and was the only use site anyway. Thinking on this more, it makes sense to abstract this step to the root as currently, it involves both parsing and type-checking, not just type-checking.
a8ce775
to
e3ce882
Compare
I'm excited to learn about this change! Haven't looked at the code yet but I just wanted to express my appreciation to your elaborate description for this PR. It will be extremely helpful for all of us working on the compiler 🙌 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Closes #682
Part 1.
Gobal StateStructural SharingThis removes the global arena used for managing namespaces during type
checking in favour of reverting to the original approach of representing
new scopes by cloning the namespace.
Prior to the introduction of the global arena, cloning the namespace to
represent new scopes caused exponential memory growth as the entirety of
each of the inner collections was cloned along with the namespace
itself.
In this new approach, we aim to avoid this by using clone-friendly,
drop-in-replacement collections from the
im
crate that enable"structural sharing" of the
Namespace
fields. See TODO below forwhether this is done yet or not.
You can read more about structural sharing here. The gist is that
rather than deep cloning the entire structure, we clone a reference to
the original state, and new allocations only occur upon modifying the
data structure with only the memory needed to record the difference.
The result is that we can avoid the side-effecting nature of maintaining
global state, making it easier to reason about how
Namespace
s areaccessed and updated within functions by reading their signatures.
It should also allow us to lean on RAII for cleaning up the namespace
contents, solving the bug where we are currently unable to compile the
same project more than once within the same process as mentioned
here. This is particularly relevant to
sway-lsp
where we willlikely want to lean on
sway-core
to type check the same code manytimes within one process.
This also gets us a step toward achieving referential transparency for a
lot of our type checking functions which may enable easier caching and
parallelism in the future, though this would still currently be blocked
by the type engine's global state.
Many of the newer methods previously provided by
NamespaceWrapper
andNamespaceRef
have been moved toNamespace
itself.Part 2. Absolute Paths
The final set of changes in this PR are a refactor of the
Namespace
type inorder to ensure access to the whole module hierarchy (as well as the init
namespace) during all of type-checking.
Splitting
Namespace
intoRoot
,Module
andItems
.Previously, we used a single
Namespace
structure to represent not only themodule tree but the items within each module too.
This PR refactors the original
Namespace
type into:Module
: Represents a single module. Stores the set of items available withinthe module as well as the set of submodules.
Items
: The set of declarations, implementations, synonyms and aliasesavailable within a particular scope.
This allows for moving module-level behaviour onto the
Module
type, anditem-level behaviour to the
Items
type. This is particularly useful intype-checking for distinguishing between whether or not we need the entire
module tree or just the set of items for a particular module (e.g. the
monomorphize methods).
The new
Root
type is a wrapper around the project's rootModule
. Methodsthat should only be called on the root module (e.g. those that involve resolving
absolute paths, or internally resolving synonyms) have been moved onto this
type.
Module
derefs to its innerItems
, andRoot
derefs to its innerModule
.This is useful for retaining the ergonomics of the old
Namespace
type whilegaining the separation-of-concerns benefits of the new type separation in
function signatures.
Adding a new
Namespace
contextual typeA new higher-level
Namespace
type has been added that acts as a context,making it easier to enter/exit submodules and scopes, and provides some
short-hand methods to avoid the need for repeatedly indexing into
root
withthe current module path.
The
Namespace
derefs to theModule
that is currently being checked, allowingto minimize the changes throughout the rest of type-checking.
TODO
Namespace
fields with clone-friendlyalternatives provided by
im
.global mutation has also been introduced disguised by the
RwLock
.E.g. the
monomorphize
functions now mutate the namespace at thegiven reference, meaning that
ns.resolve_self_*
is alsoself-mutating. Need to review how these methods are used to ensure the
new behaviour achieves the same intended results.