Skip to content

Mutable Immutable Syntax Trees #6857

Closed
Closed
@matklad

Description

@matklad

Mad Scientist Mode: ON

This issue proposes the next step of the evolution of our syntax tree library (rowan).
The goal here is to provide an ergonomic and performant API for editing syntax trees.

Background & Status Quo

Providing nice editing API for syntax tree is hard for a classical "this is hard" reason -- aliasing.

Let's say you want to implement "move bounds to where clause" refactoring.

So, you have something like

fn foo<T: Eq>() where T: Clone {}

and you want to move Eq to Clone + Eq.

The natural way to code this would be something like this:

let t_param // T: Eq
  = func.generic_params().find_by_name("T")?;

let eq_bound // Eq
  = t_param.bounds().find_by_name("Eq")?;

let where_clause // T: Clone
  = func.where_clause().find_by_param_name("T")?;

t_param.remove_bound(&eq_bound);
where_clause.add_bound(&eq_bound)

That is, find where clause and original bound, remove the bound and snap it to the where clause.

To make this API work, we need to make it so that the tree change we affect via t_param.remove_bound call is visible to the where_clause local variable.
That is, if after removal we go up the tree from where_clause and then go down, we shouldn't see the eq_bound.

Today, the syntax tree is immutable, so all editing API return a fresh copy of a tree.
This works fine for point mutations, but breaks down when you want to modify two places at once.

With an immutable API, after you do t_param.remove_bound(&eq_bound);, the where_clause points to the wrong tree!
Functional operations compose badly in this case.

To work around this, we use SyntaxRewriter class (stolen from Roslyn, naturally).
It is configured with a replacement map, then walks the syntax tree applying substitutions from the map, reassembling a new tree in process.
This class is awkward to use and also requires O(N) processing, as it essentially rebuilds the whole tree.

Proposed API

Let's just make mutable API work :-)
This is not as scary as it sounds -- the underlying syntax tree is still immutable.
The trick is to leverage GreenNode / SyntaxNode split once again to cut through yet another tradeoff.

As a reminder, the data of syntax trees is stored in a so-called "green" tree -- a standard Arced functional tree
On top of GreenNode, we layer SyntaxNode, which adds parent pointers and absolute offsets.
Additionally, SyntaxNodes are !Sync and leverage that to use Rc and thread-local allocators.
Each SyntaxNode stores a pointer to a GreenNode.

We'll add mutating API to SyntaxNode.
As SyntaxNode fundamentally alias each other via parent pointers, the API would use &self methods.

Mutation operations would still create independent copies of green nodes.
However, they'll update all existing SyntaxNodes to point to new green nodes in the corresponding trees.
As our database stores green nodes internally, mutations won't actually mutate DB data.

To make this API less surprising, we'll add a .clone_for_update call to opt-into mutation, so assists would work like this:

let original_file = db.parse(file_id);
let file = original_file.clone_for_update();
mutate(&file);
return diff(&original_file, &file);

Implementation

How do we implement the "update all SyntaxNodes" bits?
I haven't coded this up yet, but I think this should be doable.

We add the following fields to SyntaxNode:

mutable: bool,
next_sibling: Weak<SyntaxNode>,
prev_sibling: Weak<SyntaxNode>,

These fields complete existing parent pointer to form bidirectionally-connected tree of live nodes.
We also only update those pointes if mutable is true, which means we don't increase costs of traversal for majority of cases.

An important point is that we consider only live nodes: nodes stored in local variables and their ancestors.
Two neighboring live nodes are not necessary tree neighbors.

Additionally, for mutable trees, we change semantics of the offset field -- it now stores relative offset in the parent node.
To compute full offset, we walk the tree to the root.
This is the same algorithm that IntelliJ uses.

If I am not missing something, that means that during modifications, only nodes to the path to root and their siblings need to be updated, which seems doable!

Metadata

Metadata

Assignees

Labels

C-ArchitectureBig architectural things which we need to figure up-front (or suggestions for rewrites :0) )E-hardS-actionableSomeone could pick this issue up and work on it right nowfunA technically challenging issue with high impact

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions