Skip to content

Parallelize Transform Propagation #4697

Closed
@james7132

Description

@james7132

What problem does this solve or what need does it fill?

Transform propagation can get very slow for very large scenes and deep hierarchies. Make it faster.

What solution would you like?

When investigating the performance of transform_propagate_system for #4203, one of the potential options that came up is to chunk up propagation based on the hierarchy roots and run the system in parallel. Using Query::par_for_each_mut as a replacement for single-threaded iteration allows the system to leverage the full ComputeTaskPool for very large and deep hierarchies. However, due to the &mut GlobalTransform, the query for descendant entities cannot be Clone, and thus requires the unsafe Query::get_unchecked to get child entities. This is sound if and only if the hierarchy is strictly a tree, which requires every child in the hierarchy to be globally unique. Unfortunately there is currently no way to ensure this assumption holds. This is mitigable by having a parallel lock that panics on contention.

On my local machine, this saw roughly a 4x speed up on the transform_hierarchy -- humanoid_mixed stress test, going from 8.1 ms per frame to 1.88 ms, a greater than 4x speedup, which may suggest this use of unsafe code may be worth it, provided the assumptions shown hold true.

Here's the resultant code form this experiment:
https://github.com/james7132/bevy/blob/1e7ad38da9d8ea51542b585b3ef1ed76927357f3/crates/bevy_transform/src/systems.rs#L42=

What alternative(s) have you considered?

The proposed solution above has a few drawbacks:

  • unsafe code in userspace code (bevy_transform)
  • A GlobalTransformLock component is visible in userspace ECS. Perhaps a generic Lock<T: Component>?

Adding dynamically lockable components directly into ECS is a potential extension of this idea, and keeps unsafe out of userspace code. There was a brief discussion on Discord about this: https://discord.com/channels/691052431525675048/749335865876021248/972888139783872543

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-FeatureA new feature, making something new possibleC-PerformanceA change motivated by improving speed, memory usage or compile times

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions