Description
What problem does this solve or what need does it fill?
Transform propagation can get very slow for very large scenes and deep hierarchies. Make it faster.
What solution would you like?
When investigating the performance of transform_propagate_system
for #4203, one of the potential options that came up is to chunk up propagation based on the hierarchy roots and run the system in parallel. Using Query::par_for_each_mut
as a replacement for single-threaded iteration allows the system to leverage the full ComputeTaskPool for very large and deep hierarchies. However, due to the &mut GlobalTransform
, the query for descendant entities cannot be Clone
, and thus requires the unsafe Query::get_unchecked
to get child entities. This is sound if and only if the hierarchy is strictly a tree, which requires every child in the hierarchy to be globally unique. Unfortunately there is currently no way to ensure this assumption holds. This is mitigable by having a parallel lock that panics on contention.
On my local machine, this saw roughly a 4x speed up on the transform_hierarchy -- humanoid_mixed
stress test, going from 8.1 ms per frame to 1.88 ms, a greater than 4x speedup, which may suggest this use of unsafe
code may be worth it, provided the assumptions shown hold true.
Here's the resultant code form this experiment:
https://github.com/james7132/bevy/blob/1e7ad38da9d8ea51542b585b3ef1ed76927357f3/crates/bevy_transform/src/systems.rs#L42=
What alternative(s) have you considered?
The proposed solution above has a few drawbacks:
unsafe
code in userspace code (bevy_transform)- A
GlobalTransformLock
component is visible in userspace ECS. Perhaps a genericLock<T: Component>
?
Adding dynamically lockable components directly into ECS is a potential extension of this idea, and keeps unsafe
out of userspace code. There was a brief discussion on Discord about this: https://discord.com/channels/691052431525675048/749335865876021248/972888139783872543