-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Remote entity reservation v7 #18611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remote entity reservation v7 #18611
Conversation
let target = head.add(index as usize); | ||
|
||
// SAFETY: Ensured by caller. | ||
unsafe { core::ptr::replace(target, Slot::new(entity)) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this to be sound in the Rust memory model for atomic accesses, you need to prove that any other access to this slot either happens-before this one, or this one happens-before the other.
You need to update the SAFETY comments to show that's the case, and I believe that will be impossible if all of the operations are Ordering::Relaxed
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just updated safety comments because I do think this is safe. set
is never called concurrently and is never called with an index that could be considered valid by a remote reservation. In practice, this could theoretically just be an UnsafeCell.
Given that there's no concurrency or conflicts I don't think this violates the memory access model. I can't prove which set
s will be in what order, but I can prove that they will be in a definite order. The set
calls will never conflict with other set
or get
ops. And get
is ok to conflict with other get
calls because they don't do any modification. So I think safety is upheld and relaxed ordering is fine.
Is that right? You know a lot more about this than me, so I trust your judgement.
The only place I'm ify on Relaxed
is the storing and loading of OwnedBuffer::len
, but that is never written concurrently, and we guard agains early reads by temporarily disabling the free cursor when we decrease len
, and a premature read has no affect if we are increasing it (RemoteOwned
might miss the chance to reuse a freshly flushed entity, but who cares?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that there's no concurrency or conflicts I don't think this violates the memory access model.
The concurrency comes from calling RemoteOwned::try_spawn_empty_from_free
on one thread while calling methods like Owned::free_non_empty_buffer
on another. (I thought that was the goal here; if I misunderstood then I apologize!)
You aren't doing any synchronization, so the order of operations one one thread doesn't affect the other. The set
call in free_non_empty_buffer
could be reordered after the final update to free_cursor
, such that try_spawn_empty_from_free
reads the new value of free_cursor
but get
returns an old value at that index. And again, it's worse than that, because get
and set
do non-atomic memory access, so the lack of synchronization is UB.
If you aren't familiar with memory orderings, it might help to read the Memory Ordering chapter of Rust Atomics and Locks. The links from Memory model for atomic accesses to C++20 atomic orderings and the nomicon may also be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Ok the problem is making more sense now. I was assuming that cache coherency would keep the reads up to date, but I guess on platforms that dow write through for atomics or something, that's not a guarantee. IIUC, that's why you were saying non-atomic, relaxed, and Aquire/Release were all implemented the same on x86.
(Thank's for your patience with me on this. These details are new to me, and it's a lot to get my head around.)
But I think, slapping Aquire/Release on everything would fix it, and that would be free for platforms like x86. Am I getting that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think, slapping Aquire/Release on everything would fix it, and that would be free for platforms like x86. Am I getting that right?
Right, that will get you most of the way there! I think you'll find that you also need a store-release after the read in try_spawn_empty_from_free
, since you'll need a way to make sure that read happens before the slot is re-used by a free
. That's why the other concurrent queue implementations have a "stamp".
It's really easy to miss cases, though! You'll want to be very precise with your safety comments. Ideally you'd use something like loom, which can help find bugs by trying lots of different interleavings of operations. I'm not sure how hard it would be to get that working with Bevy CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok that makes sense. I'll do that if we proceed here.
I have good news and bad news. The bad news is I'm working on v8. The good news is this should be much simpler to review and reason about.
I'm going to work on v8 for now, but if we end up going with this, I will polish this up. And thank's for showing me loom. That's really cool!
It looks like your PR has been selected for a highlight in the next release blog post, but you didn't provide a release note. Please review the instructions for writing release notes, then expand or revise the content in the release notes directory to showcase your changes. |
Closing in favor of v9: #18670. Thanks for the reviews here. |
Objective
This is v7, so you get the idea. See also v6 (#18525). I'll fill this in more if we proceed here.
Solution
The full design plan is in #18577. A few things changed (actually for the better) as I wrote this.
Instead of accommodating remote reservation from the free list all the time, it is disable very briefly during some operations.
Namely, when freeing an entity or moving it out of the empty archetype, remote reservation will have to get a new brand new entity index. This is faster than the alternative. The only downside is that remote reservation may (very rarely) increase the total number of entities when one could be reused instead. But that would be rare, and it's better than blocking / waiting IMO.
Costs:
Future:
Draft
This is ver much a draft to facilitate discussion. This does not interact with
EntityMeta
or the rest of the ECS yet.Only the atomic aspects of the design have been built.