-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tree arena #752
Tree arena #752
Conversation
I am definitely doing something silly to break the needed preconditions, just need to work out what Edit: |
EDIT: Wait, no, I misunderstood the question. |
(Copying from zulip) Your PR looks good, but we'd probably want more intermediary steps before relying on custom unsafe code; especially since the TreeArena API isn't quite settled yet. Those steps would be:
The test suite would help us test the unsafe version with MIRI, which we'd probably add to CI. |
582557d
to
4f7a05f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very high-quality contribution, and will be good to merge once the review points are addressed.
Something orthogonal to this review is the question of how the TreeArena API will evolve.
Having an unsafe implementation may slow us down if we want to try API changes and we have to make sure they're implemented in both versions.
I'd like to establish that the safe version will be a first-class citizen and the unsafe version a second-class citizen:
- The safe version may have features / APIs that the unsafe version doesn't yet have.
- If both versions are at feature parity, Masonry can switch on the unsafe version for best performance.
- Otherwise, Masonry uses the safe version.
If that's what we go with, we should documented that pattern in a few places.
tree_arena/ARCHITECTURE.md
Outdated
## Architecture | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my view, the point of an ARCHITECTURE.md file is to give just enough context to understand everything about a codebase, not just the public API.
As such, I think this file should start with a description of the safe-and-unsafe implementations and why we include both.
tree_arena/ARCHITECTURE.md
Outdated
Of finding children: $O(1)$ - previously $O(\text{children})$ | ||
|
||
Of finding deeper descendants: $O(\text{depth})$ - ideally will be made $O(1)$ | ||
|
||
Access from the root: $O(1)$, previously $O(\text{depth})$ - improved as all nodes are known to be descended from the root |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather avoid Latex for this section. ARCHITECTURE.md is mostly meant to be read in a code editor, you can use plain text.
Also, I think the descriptions could be a little clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see an argument for using the GitHub Markdown MathJax extension. But for big-O notation,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On further thinking, given readme is so bare, does it make more sense to flesh this out more, but also move it into Readme rather than having a superfluous file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. The general workflow I like is "README for things you need to understand the project, ARCHITECTURE for things you need to understand the project's internals that you wouldn't get from the README or the doc root".
tree_arena/ARCHITECTURE.md
Outdated
It is possible to get shared (immutable) access or exclusive (mutable) access to the tree. These return `ArenaRef<'arena, T>` or `ArenaMut<'arena, T>` respectively | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this description could go into a little more detail about how the shared/exclusive access works, and why this crate is worth bothering with.
The general idea of TreeArena is to express a tree-like ownership graph, stored inside a flat data structure. The UnsafeCells are meant to bridge the gap between the two.
Regarding testing with miri in CI, the invocation I think is: cargo +nightly miri t -p tree_arena --no-fail-fast --no-default-features but I'm unsure how best to access nightly rust in the workflow to run this |
tree_arena/src/tree_arena_unsafe.rs
Outdated
/// # SAFETY | ||
/// | ||
/// When using this on [`ArenaMutChildren`] associated with some node, | ||
/// must ensure that `id` is a descendant of that node, otherwise can | ||
/// obtain two mutable references to the same node | ||
unsafe fn find_mut(&mut self, id: impl Into<NodeId>) -> Option<ArenaMut<'_, T>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! I have a minor note on the safety documentation. This is sort of a comment out of nowhere, but I've developed an interest in reviewing unsafe code and I've been following the development in this repo.
Since this retains the full &mut self
inside the returned ArenaMutChildren
, I think the safety requirements are more extensive than the case described here. Any use of parent_arena
in ArenaMutChildren
needs to be careful to not invalidate the reference produced from unsafe { item.get().as_mut() }
below. E.g. DataMap::find
also must only be called on descendants, parent_arena.items.remove()
must only operate on descendants, parent_arena.items.insert()
must not overwrite existing nodes, parent_arena.items
must be used to clear the whole structure, etc.
I was initially thinking the safety documentation could be reworked to note the overall requirement that the parent_arena
field must not be used to invalidate the returned item
reference and then include a few examples. However, I think all call sites would have the same safety note pointing to the details of ArenaMutChildren
. So since all the details of ArenaMutChildren
are private to this module, find_mut
could be safe, and the unsafe
block in it could be documented with the fact that all operations that ArenaMutChildren
allows only access descendants and never invalidate the item reference created here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with the overall point that the safety requirement should be better put as that you can only access children or remove children of the current node (I'm not sure whether insert() comment would be needed as it is a panic to insert a node with the same name as the key, and not sure clearing is different to removal as there isn't a clear api, and I assume any would be an action on the Arena itself thus checked by the borrow checker as having a reference to an item in the tree prevents any mutable access of the tree itself)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding safety, I think this makes sense, but can also see another argument that it being unsafe is correct in the case it did become public (though assume then the same should be true of the immutable method otherwise could obtain a immutable and mutable reference to the same node)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether insert() comment would be needed as it is a panic to insert a node with the same name as the key,
there isn't a clear api, and I assume any would be an action on the Arena itself thus checked by the borrow checker as having a reference to an item in the tree prevents any mutable access of the tree itself
These are comments about what the safe code in ArenaMutChildren
is allowed to do with the internal items
hashmap (which has things such as insert without such checks and clear).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending changing the find
methods and removing the children lists.
You can consider everything else optional before merging.
Another blocker before this is merged: I'd like for some version of my comment to show up in the documentation:
|
Added to the readme and the doc comment at the root of the crate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I'd maybe like a comment noting that TreeNode::children
needs to be removed but otherwise we're good.
I'm not sure |
78ccfcd
to
6f503e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to approve the unsafe code, as I don't fully understand it.
However, given that we use the entirely safe version by default, it seems low-risk enough to land this. Not approving because of some of the docs concerns
// Copyright 2024 the Xilem Authors | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
//! This crate implements a tree data structure for use in Masonry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to use cargo-rmde here; it's already set up in CI.
That doesn't need to block this PR, but it would be a good follow-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity is that the content after the licence in the other lib.rs eg lines 4-22 in https://github.com/linebender/xilem/blob/main/xilem_core/src/lib.rs ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. This is confusing.
No, this is the setup used specifically in the Masonry crate. The version in Xilem Core is an old version of that
#[derive(Debug)] | ||
struct TreeNode<T> { | ||
item: T, | ||
children: Vec<NodeId>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also a little bit surprising to me...
#[derive(Debug)] | ||
struct DataMap<T> { | ||
/// The items in the tree | ||
items: HashMap<NodeId, Box<UnsafeCell<TreeNode<T>>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we be able to have something like an UnsafeMutex
type, ideally which is implemented safely by default but with an unsafe passthrough version, to check some that invariants are being met.
That is, have this same implementation, but in a checked manner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but would require quite a few changes as the TreeNode contains the contents themselves and the list of children, but the API returns these in separate structs - the item
and the ArenaChildren
(mut or ref) so I think both would need to be wrapped in some form of mutex for run time checking separately ?
I've done some spot checks of a few examples, and this all seems to work. As discussed on Zulip, we'll land this now, because the unsafe code is off by default. |
Use `cargo rdme` for crate readme and check in CI as mentioned in #752 (comment)
I've had a try at implementing the unsafe
tree_arena
in a separate lib (but in the same workspace for now) - haven't thought of a good way to make non root access not O(depth), without slowing down insertion or using more memory, though have improved accessing nodes from the root to O(1) from O(depth) and accessing direct children to O(1) from O(children)