-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Remote entity reservation v9 #18670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Remote entity reservation v9 #18670
Conversation
free and pending are now more distinct
Co-authored-by: atlv <email@atlasdostal.com>
let next_index = self.indices.next()?; | ||
let (chunk, index, chunk_capacity) = self.buffer.index_in_chunk(next_index); | ||
|
||
// SAFETY: Assured by constructor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FreeBufferIterator
has no constructor/new
function, so it might be helpful to add one or change this to reference FreeBuffer::iter
. Same for the reference to constructor below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean as in a new
function; I just meant whoever constructs it followed the safety comment on the type docs. I think that's a common pattern in the ecs. I could move it to a new
function, but someone could miss the safety comment then by manually constructing it. But if you have a better idea here, I'm all ears! I don't like type level safety comments either.
Co-authored-by: Christian Hughes <9044780+ItsDoot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, been reviewed enough I'm confident its more or less ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't spot any glaring issues, just some minor nits, but I'm not yet confident I grok the exact mechanisms that enforce the safety / transactional guarantees between free
and remote_alloc
.
That seems like a crucial part of this PR, so some of my comments are just questions about it.
fn next(&mut self) -> Option<Self::Item> { | ||
if let Some(found) = self.current.next() { | ||
return Some(found.get_entity()); | ||
} | ||
|
||
let still_need = self.indices.len() as u32; | ||
let next_index = self.indices.next()?; | ||
let (chunk, index, chunk_capacity) = self.buffer.index_in_chunk(next_index); | ||
|
||
// SAFETY: Assured by constructor | ||
let slice = unsafe { chunk.get_slice(index, still_need, chunk_capacity) }; | ||
self.indices.start += slice.len() as u32; | ||
self.current = slice.iter(); | ||
|
||
// SAFETY: Constructor ensures these indices are valid in the buffer; the buffer is not sparse, and we just got the next slice. | ||
// So the only way for the slice to be empty is if the constructor did not uphold safety. | ||
let next = unsafe { self.current.next().debug_checked_unwrap() }; | ||
Some(next.get_entity()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit
Would you be willing to add some more comments and/or use more descriptive field and variable names here, so it's a bit more obvious what's going on? It wasn't immediately clear to me that the range tracks the span of slots that remain to be iterated beyond the current chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thought. I renamed the struct fields and gave them doc comments. If you see something else though, I'm open to other renames too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
// The goal is the same as `alloc`, so what's the difference? | ||
// `alloc` knows `free` is not being called, but this does not. | ||
// What if we `len.fetch_sub(1)` but then `free` overwrites the entity before we could read it? | ||
// That would mean we would leak an entity and give another entity out twice. | ||
// We get around this by only updating `len` after the read is complete. | ||
// But that means something else could be trying to allocate the same index! | ||
// So we need a `len.compare_exchange` loop to ensure the index is unique. | ||
// Because we keep a generation value in the `FreeCount`, if any of these things happen, we simply try again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to check my understanding.
- The free buffer's chunks cannot move once allocated.
free
has a critical section guarded by acquire and release semantics.
So is the idea that remote_alloc
is safe because remote_alloc
will either see the disabled flag immediately or its compare-exchange will fail when it sees the change (e.g. change in disabled flag or generation value)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. If it fails, spin and try again. Never interrupt another action from remote_alloc
; just wait for it to finish.
// Also subtract one from the generation bit. | ||
subtract_length | Self::GENERATION_LEAST_BIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to picture how this counter state evolves over time. Is the generation counting down or counting up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question.
It doesn't matter if it goes up or down as long as it's consistent. Here's an example:
- Fresh allocator: generation=0, enabled, len=0
- Try to pop/alloc; We subtract 1 from length and (to keep it one op) 1 from generation: generation (which wrapped) = Some big number N, enabled, len=-1, which presents as 0 still since length is not really negative.
- Push/free; We load the state and add 1 to the presented length (0 even though -1 is stored; length can't be negative) and don't touch the generation. Then, set the state. The state is changing anyway, and the only way to change back is to pop, which changes the generation. So we only need to change the len: generation=N, enabled, len=1.
- Pop/alloc; We subtract 1 from generation and length: generation=N-1, enabled, len=0.
- Remote Pop/Remote Reserve: We acquire load the state, but the len is 0 so we return None.
- Push; Same as before: generation=N-1, enabled, len=1.
- Start remote pop A; acquire load the state, grab the right entity, prepare an ideal state of generation=N-2, enabled, len=0, but we haven't stored it yet.
- Now a free starts; load the state, disabling it: generation=N-1, disabled, len=1.
- Start remote pop B; acquire load the state, see that it's disabled, and spin.
- Remote pop A fails it's compare exchange (because the disabled bit is on), and spins.
- The free finishes as before: generation=N-1, enabled, len=2
- Remote pop A tries again, making a new ideal state, and it's compare exchange succeeds: generation=N-2, enabled, len=1.
- Remote pop B tries again, making an ideal state of generation=N-3, enabled, len=0, but doesn't store it yet.
- Local pop; generation=N-3, enabled, len=0.
- Remote pop B compare exchange fails, tries again but sees a 0 len, and returns none.
I know that's a complex example, but hopefully that demonstrates how the state might change over time in some of the edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I understand now. Pushing increments the length and popping decrements both the length and generation, so every operation results in a unique state.
There's no ABA problem because once we push to the free list, we can't return to the same length without advancing the generation.
Relaxed actually might not be fine here in some situations. Aquire and Release are used everywhere for the state anyway.
…9121) # Objective This is a followup to bevyengine#18704 . There's lots more followup work, but this is the minimum to unblock bevyengine#18670, etc. This direction has been given the green light by Alice [here](bevyengine#18704 (comment)). ## Solution I could have split this over multiple PRs, but I figured skipping straight here would be easiest for everyone and would unblock things the quickest. This removes the now no longer needed `identifier` module and makes `Entity::generation` go from `NonZeroU32` to `struct EntityGeneration(u32)`. ## Testing CI --------- Co-authored-by: Mark Nokalt <marknokalt@live.com>
OK, I understand the high-level design now. The safety invariants seem fine.
This looks good to me, but I just wanted to ask one more question for clarification— An alternative to this was having the world poll a queue for remote allocation requests during each |
Yeah, pretty much. In my experimentation, I found pretty much five ways to do remote reservation:
So this Pr is the best solution of the options I tried IMO. It has the best performance, doesn't block, lock, or await. (It may wait for a As a history of what I've explored:
I'm not saying I've explored everything, but v9 is the best so far IMO. That said, if anyone has an idea I haven't tried, I'm all ears. For reference flecs seems to be kinda similar to v9. It pages entities; v9 calls them chunks. I haven't looked too closely, and I'm sure there are differences, but I think this is the right track. |
That's alright. I was asking because I didn't follow all of your discussions and design work leading up to this PR, and I wanted to be sure I had the justification right before approving. |
(My approval doesn't hinge on any of this stuff. I just wanted to address the other parts of your comment separately.)
Good to know, but IIRC components-as-entities had only been blocked on deciding whether or not to let Correct me if I'm wrong but this seems only loosely related. Asset loaders don't have access to the world, so they can't initialize queries or systems (and that shouldn't change in the future because queries and systems will be entities that need to be instantiated). In any case, handling component entities being despawned shouldn't be a tough problem. Bevy's ECS has immutable components and reactivity, so it'd be easy enough to create some Reflection can be significantly empowered (and simplified) if component registrations are normal entities and you can attach your own components to them, so I don't see it as something we want to avoid.
flecs' entity metadata is closer to how bevy currently is (last I checked anyway). Its equivalent of the (flecs has a I don't know if flecs directly supports "remote" reservation of entities or if Sander even considers it necessary. (I don't recall it being a feature but if he's said otherwise then ignore me lol.) |
Thanks, this is an amazing writeup! |
Yeah; I don't mean that remote reservation is a prerequisite for components as entities; just that it lets us start reworking the A few of those questions (that I still have anyway) are: 1) Is panicking the best way to handle a removed/despawned component entity? Why not yank the component from all archetypes and continue? 2) Can we make the resource singleton entity different from the component info singleton entity? I don't see why not and that would let us punt
Yeah, this is 100% a separate issue. I just want to do the id/fragmenting portion of components as entities soon. There's already prs for query by value, fragmenting components, etc that would really benefit from this.
That's a solid option. I don't mean to dig up old questions if we've already reached a consensus here. I just don't know what it is. Maybe worth laying out a plan explicitly somewhere. If one exists, I haven't been able to find it.
Very much looking forward to this!
Yeah, that makes sense. I only glanced at it, and yeah; doesn't seem to support remote reservation. |
Eh, I mean, it's not like today's Bevy lets you unregister a component. Configurable cleanup is a feature itself, so I wouldn't tie them together.
I think the matter of
I don't have a list, but low-hanging fruit would be components that represent "implements If we spawn entities to represent plugins and link them to their components with Things like that. |
For y'all who haven't been following this for a while, I want to draw attention to this comment from a while ago. It raises a potential performance concern about a benchmarking blindspot. My assessment is that it's fine, but I want this to be out in the open. There's also a doc out here that gives some more background if you're interested. |
fixes #18003
Objective
It's version 9 of the same objective lol. For assets as entities, we need entities to be able to be reserved from any thread. Ideally, this can be done without depending on an async context, blocking, or waiting. Any of these compromises could hurt asset performance or discontinue the completely non-blocking nature of the asset system.
As a bonus, this PR makes allocating entities only need
&Entities
instead of&mut
.Entities::flush
is now completely optional, meaning none of theEntities
methods depends on the flush at all, and there is protection against flushing an entity twice.(If you're curious, v9 actually branched from v8. v8 was focused on #18577 (never flush entities), but this still includes flushing.)
Solution
Organizationally, I split off the underlying
EntityAllocator
fromEntities
. This makes it easier to read, etc, now that it's more involved.The basic problem is that we need to be able to allocate an entity from any thread at any time. We also need to be able to free an entity. So at the allocator level, that's 3 operations:
free
,alloc
(For when you knowfree
isn't being called), andremote_alloc
(can be called any time). None of these can require mutable access.The biggest challenge is having a list of entities that are
free
and waiting to be re-used. The list needs to be fully functional without mutable access, needs to be resizable, and needs to be pinned in memory. I ended up using a strategy similar toSplitVec
. That dependency requiresstd
, and knowing the max capacity ahead of time lets us simplify the implementation, so I made my own implementation here.Testing
No new tests right now. It might be worth using loom at some point, but that requires an additional dependency, test-specific loom feature flags, and giving this treatment to multiple crates, especially bevy_platform.
Future work
#18577 is still a good end game here IMO. Ultimately, (just like @maniwani said would happen), I decided that doing this all at once would be both too challenging and add too much complexity. However, v9 makes "never flush" much, much more approachable for the future. The biggest issues I ran into were that lots of places hold a reference to an entity's
Archetype
(but the entity now might not have an archetype) and that checking archetypes everywhere might actually be less performant than flushing. Maybe.We can also potentially speed up a lot of different processes now that
alloc
can be called without mutable access andfree
(etc.) can be called without needing toflush
first.Costs
Benchmarks
Interpreting benches:
In most places, v9 is on par with or even faster than main. Some notable exceptions are the "sized_commands" and "fake_commands" sections, but the regression there is purely due to
Entities::flush
being slower, but we make up for that elsewhere. These commands don't actually do anything though, so this is not relevant to actual use cases. The benchmarks just exist to stress testCommandQueue
.The only place where v9 causes a significant and real-world applicable regression is "spawn_commands", where v9 is roughly 15% slower than main. This is something that can be changed later now that
alloc
doesn't need mutable access. I expect we can change this 15% regression to a 15% improvement given that "spawn_world" is roughly 20% faster on v9 than on main. For users that need really fast spawn commands though, they are already using some form of batch spawning or direct world access.Other regressions seem to be either minimal, unrealistic, easily corrected in the future, or wrong. I feel confident saying "wrong" since running them back to back can sometimes yield different results. I'm on a M2 Max, so there might be some things jumping from perf cores to efficiency cores or something. (I look forward to standardized benchmarking hardware.)
Wins: I was worried that, without "never flush", this would be an overall regression, but I am relived that that is not the case. Some very common operations, "insert_simple/unbatched" for example, are way faster on this branch than on main. Basically, on main,
alloc
also addsEntityMeta
for the entity immediately, but on this branch, we only do so inset
. That seems to improve temporal cache locality and leads to this roughly 220% improvement. "added_arhcetype" sees 20%-80% improvements too, etc. "iter_simple/foreach_wide" also sees a 270% improvement.I think in practice, v9 will out perform main for real-world schedules. And I think moving towards "never flush" (even for only a few operations, like
Commands::spawn
) will improve performance even more.