Mutable Immutable Syntax Trees

Mad Scientist Mode: ON

<p align="center">
  <img src="https://github.com/matklad/drop_bomb/raw/master/Merkw%C3%BCrdigliebe.jpg">
</p>

This issue proposes the next step of the evolution of our syntax tree library (`rowan`).
The goal here is to provide an ergonomic and performant API for editing syntax trees.

## Background & Status Quo

Providing nice editing API for syntax tree is hard for a classical "this is hard" reason -- aliasing.

Let's say you want to implement "move bounds to where clause" refactoring.

So, you have something like

```rust
fn foo<T: Eq>() where T: Clone {}
```

and you want to move `Eq` to `Clone + Eq`.

The natural way to code this would be something like this:

```rust
let t_param // T: Eq
  = func.generic_params().find_by_name("T")?;

let eq_bound // Eq
  = t_param.bounds().find_by_name("Eq")?;

let where_clause // T: Clone
  = func.where_clause().find_by_param_name("T")?;

t_param.remove_bound(&eq_bound);
where_clause.add_bound(&eq_bound)
```

That is, find where clause and original bound, remove the bound and snap it to the where clause.

To make this API work, we need to make it so that the tree change we affect via `t_param.remove_bound` call is visible to the `where_clause` local variable.
That is, if after removal we go *up* the tree from `where_clause` and then go down, we shouldn't see the `eq_bound`.

Today, the syntax tree is immutable, so all editing API return a fresh copy of a tree.
This works fine for point mutations, but breaks down when you want to modify two places at once.

With an immutable API, after you do `t_param.remove_bound(&eq_bound);`, the `where_clause` points to the wrong tree!
Functional operations compose badly in this case.

To work around this, we use `SyntaxRewriter` class (stolen from Roslyn, naturally).
It is configured with a replacement map, then walks the syntax tree applying substitutions from the map, reassembling a new tree in process.
This class is awkward to use and also requires O(N) processing, as it essentially rebuilds the whole tree.

## Proposed API

Let's just make mutable API work :-)
This is not as scary as it sounds -- the underlying syntax tree is still immutable.
The trick is to leverage `GreenNode` / `SyntaxNode` split once again to cut through yet another tradeoff.

As a reminder, the data of syntax trees is stored in a so-called "green" tree -- a standard `Arc`ed functional tree
On top of `GreenNode`, we layer `SyntaxNode`, which adds parent pointers and absolute offsets.
Additionally, `SyntaxNode`s are `!Sync` and leverage that to use `Rc` and thread-local allocators.
Each `SyntaxNode` stores a pointer to a `GreenNode`.

We'll add mutating API to `SyntaxNode`.
As `SyntaxNode` fundamentally alias each other via parent pointers, the API would use `&self` methods.

Mutation operations would still create independent copies of green nodes.
However, they'll update all existing `SyntaxNode`s to point to new green nodes in the corresponding trees.
As our database stores green nodes internally, mutations won't actually mutate DB data.

To make this API less surprising, we'll add a `.clone_for_update` call to opt-into mutation, so assists would work like this:

```rust
let original_file = db.parse(file_id);
let file = original_file.clone_for_update();
mutate(&file);
return diff(&original_file, &file);
```

## Implementation

How do we implement the "update all `SyntaxNode`s" bits?
I haven't coded this up yet, but I think this should be doable.

We add the following fields to `SyntaxNode`:

```rust
mutable: bool,
next_sibling: Weak<SyntaxNode>,
prev_sibling: Weak<SyntaxNode>,
```

These fields complete existing `parent` pointer to form bidirectionally-connected tree of live nodes.
We also *only* update those pointes if `mutable` is `true`, which means we don't increase costs of traversal for majority of cases.

An important point is that we consider only `live` nodes: nodes stored in local variables and their ancestors.
Two neighboring live nodes are not necessary tree neighbors.

Additionally, for mutable trees, we change semantics of the offset field -- it now stores relative offset in the parent node.
To compute full offset, we walk the tree to the root.
This is the same algorithm that IntelliJ uses.

If I am not missing something, that means that during modifications, only nodes to the path to root and their siblings need to be updated, which seems doable!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mutable Immutable Syntax Trees #6857

Background & Status Quo

Proposed API

Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mutable Immutable Syntax Trees #6857

Description

Background & Status Quo

Proposed API

Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions