Skip to content

Conversation

@aevyrie
Copy link
Member

@aevyrie aevyrie commented Dec 22, 2025

Objective

Solution

  • Replace the EntityHashSet with a component added directly to entities. This still allows for correct change detection triggers on visibility, but avoids hashing. This also enables parallel updates.

Summary

  • This PR lead to a rabbit hole that uncovered Major perf regression in d00240d #22256
  • This PR resolves the regression introduced since 0.17 released
  • A more fair comparison is not to main (which has a regression), but to 0.17
  • Compared to 0.17.3, this is a 27% speedup to the PostUpdate schedule for caldera hotel on an M4 Max
image

Testing

  • Many cubes, many foxes, caldera
  • Yellow is new, red is old.
  • Note the bimodality in the old traces, this was consistently repeatable, and seems to have something to do with EntityHashSet and EntityHashMap. Worth investigating that further, as I've seen that bimodal behavior before, and blamed it on pcores vs ecores, but I verified that is not happening.
  • Summary: caldera hotel no longer exhibits bimodal performance with a pathological mode that runs very slowly. Roughly comparing the ~90th percentile performance, the optimized code is about 1.2ms faster. This is particularly significant when you consider this is running on a device that is already hitting 150fps (GPU bound), so it only has a frame budget of 6.7ms.
  • Notably, visibility checking was the last egregiously performing set of systems in the common hot path of most bevy apps, and no longer stick out as obviously slow in a frame profile:

A high level comparison of the change in CPU time by looking at just PostUpdate between old(red) and new(yellow)

image

caldera hotel - all entities in view

For this test, I didn't touch the camera, so all entities were in view at all times.

check_visibility

image

reset_view_visibility

image

mark_newly_hidden_entities_invisible

This is 23us slower than the "fast mode" of the existing code on main, but 180-250us faster than the weird "slow mode" on main. The new code is significantly more stable, and does not exhibit the super slow mode.

image

caldera hotel - no entities in view

For this test, I immediately rotated the camera so the hotel was not in view.

check_visibility

This is largely a wash. The old code is 2us faster, but this is likely in the realm of noise.

image

reset_view_visibility

This is a big win. The main peak is about 30us faster now, but the major win is the worst case, which is nearly 500us faster.

image

mark_newly_hidden_entities_invisible

Same story as the other caldera comparison with everything in view, this is a bit slower than the fast mode, but way faster than the slow mode on main.

image

@aevyrie
Copy link
Member Author

aevyrie commented Dec 22, 2025

This needs a comments pass - visibility comments are pretty out of date, and this makes it worse.

@aevyrie aevyrie marked this pull request as draft December 22, 2025 06:49
@aevyrie
Copy link
Member Author

aevyrie commented Dec 22, 2025

Just grokked some things about why the way the code is the way it is. Need to fix some things and trying to make it way less confusing.

@aevyrie aevyrie marked this pull request as ready for review December 23, 2025 20:22
@aevyrie
Copy link
Member Author

aevyrie commented Dec 23, 2025

Some overall frametime numbers from Griffin on discord

I have some odd results.
Ran with the builtin benchmark on my caldera hotel 01 repo. 7950x/4070ti/linux (CPU time)
6.06ms bevy main (not super consistent but didn't see spikes)
4.86ms faster-vis-checks (often ~4ms with random spikes up to ~13ms)
4.74ms bevy 0.17.3 (also often ~4ms with random spikes up to ~13ms)

@aevyrie
Copy link
Member Author

aevyrie commented Dec 24, 2025

Verified this exact same behavior reproduces on a lower CPU spec windows machine as well. This confirms that bimodal behavior seemingly coming from the hash set is not platform specific.

Saw a 8.25ms main -> 7.64ms frametime improvement once I upped the parallelism of the thread pool.

@IceSentry IceSentry added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Dec 24, 2025
@hukasu
Copy link
Contributor

hukasu commented Dec 25, 2025

Is the only transitions that needs to be verified is changing from visible to hidden? The inverse (hidden to visible) is not needed?

@aevyrie
Copy link
Member Author

aevyrie commented Dec 25, 2025

What do you mean by "verified"?

Do these docs help? (on WasVisibleNowHidden) https://github.com/bevyengine/bevy/pull/22226/changes?diff=split#diff-fec33d34072b7b4be336571ceecf2c10fc9bafd8366c1b29531089b7e3ef621cR527

@hukasu
Copy link
Contributor

hukasu commented Dec 25, 2025

maybe checked would be a better term

would having a N-M relationship (when available) that links entities to cameras replace that component?
i.e.

#[derive(Component)]
#[relationship(relationship_target = VisibleIn)]
struct Viewing(HashSet<Entity>);

#[derive(Component)]
#[relationship_target(relationship = Viewing)]
struct VisibleIn(HashSet<Entity>);

and checking if the entity has VisibleIn or if VisibleIn::len() == 0 be equivalent to WasVisibleNowHidden?

@aevyrie
Copy link
Member Author

aevyrie commented Dec 26, 2025

What does using a relationship provide here? I don't follow the purpose of your suggestion.

Revisiting your original question, I think I understand what you're asking now

Is the only transitions that needs to be verified is changing from visible to hidden?

No, we need to prevent all spurious change triggers, the ViewVisibility component should only be mutated when the value has changed globally. This is because rendering systems downstream use change detection to avoid excess updates to the GPU when nothing has changed.

This is challenging to do efficiently, and especially in a parallelizable way. The docs explain why we need to track this state in a scratch space, and doing so in a component makes parallel updates easy compared to a single global hash set. It's not immediately clear how using relations would help with that.

and checking if the entity has VisibleIn or if VisibleIn::len() == 0 be equivalent to WasVisibleNowHidden?

I don't think so, WasVisible_ is specifically covering the case of the visibility of the entity last update. This proposed relation is just covering the current state as far as I can tell.

Also, as far as I've seen in the past, relationship updates are extremely slow compared to dense parallel component updates.

@hukasu
Copy link
Contributor

hukasu commented Dec 26, 2025

Ok, that cleared my question

@aevyrie aevyrie mentioned this pull request Dec 29, 2025
@alice-i-cecile alice-i-cecile added this to the 0.18 milestone Dec 30, 2025
github-merge-queue bot pushed a commit that referenced this pull request Dec 30, 2025
# Objective

- Add a stress test that exercises the 3d mesh pipeline for dynamic
scenes.
- Large static scenes like caldera hotel don't expose performance issues
when many meshes are moving.
- Give us a way to benchmark PRs like
   - #22297
   - #22281
   - #22226

## Solution

- Make a 3d version of `bevymark`, sticking to the existing patterns as
closely as possible.

## Testing

<img width="1072" height="684" alt="image"
src="https://github.com/user-attachments/assets/41214ba9-ffad-471d-a320-1f007490dead"
/>

---------

Co-authored-by: Carter Anderson <mcanders1@gmail.com>
Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple enough, and those numbers are great. Thanks!

@alice-i-cecile alice-i-cecile added the P-Regression Functionality that used to work but no longer does. Add a test for this! label Dec 31, 2025
@alice-i-cecile alice-i-cecile modified the milestone: 0.18 Jan 1, 2026
Copy link
Contributor

@pcwalton pcwalton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, only one minor nit.

@alice-i-cecile alice-i-cecile added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Jan 1, 2026
@alice-i-cecile alice-i-cecile added this pull request to the merge queue Jan 1, 2026
Merged via the queue into bevyengine:main with commit e4eb916 Jan 1, 2026
40 checks passed
@github-project-automation github-project-automation bot moved this to Done in Rendering Jan 1, 2026
cart pushed a commit that referenced this pull request Jan 8, 2026
# Objective

- Make visibility systems less slow.
- Saves as much as 1.2ms of CPU time on a (GPU bound) 6.7ms frame,
rendering caldera hotel.
- Fixes #22256

## Solution

- Replace the `EntityHashSet` with a component added directly to
entities. This still allows for correct change detection triggers on
visibility, but avoids hashing. This also enables parallel updates.

## Summary

- This PR lead to a rabbit hole that uncovered #22256 
- This PR resolves the regression introduced since 0.17 released
- A more fair comparison is not to main (which has a regression), but to
0.17
- Compared to 0.17.3, this is a 27% speedup to the `PostUpdate` schedule
for caldera hotel on an M4 Max

<img width="558" height="251" alt="image"
src="https://github.com/user-attachments/assets/2d711e3c-b65e-4c3a-9d2a-a587f8f36aae"
/>

## Testing

- Many cubes, many foxes, caldera
- Yellow is new, red is old.
- Note the bimodality in the old traces, this was consistently
repeatable, and seems to have something to do with `EntityHashSet` and
`EntityHashMap`. Worth investigating that further, as I've seen that
bimodal behavior before, and blamed it on pcores vs ecores, but I
verified that is not happening.
- Summary: caldera hotel no longer exhibits bimodal performance with a
pathological mode that runs very slowly. Roughly comparing the ~90th
percentile performance, the optimized code is about 1.2ms faster. This
is particularly significant when you consider this is running on a
device that is already hitting 150fps (GPU bound), so it only has a
frame budget of 6.7ms.
- Notably, visibility checking was the last egregiously performing set
of systems in the common hot path of most bevy apps, and no longer stick
out as obviously slow in a frame profile:

A high level comparison of the change in CPU time by looking at just
PostUpdate between old(red) and new(yellow)

<img width="1738" height="512" alt="image"
src="https://github.com/user-attachments/assets/5252ff0f-d9f3-49d1-a3ce-309bd84d5485"
/>

### caldera hotel - all entities in view

For this test, I didn't touch the camera, so all entities were in view
at all times.

`check_visibility`

<img width="1262" height="397" alt="image"
src="https://github.com/user-attachments/assets/c80c470a-1548-42d2-838c-4e74525a6cb8"
/>

`reset_view_visibility`

<img width="1262" height="407" alt="image"
src="https://github.com/user-attachments/assets/7b550823-f832-41f1-9e42-20cab342b0f2"
/>

`mark_newly_hidden_entities_invisible`

This is 23us slower than the "fast mode" of the existing code on main,
but 180-250us *faster* than the weird "slow mode" on main. The new code
is significantly more stable, and does not exhibit the super slow mode.

<img width="1256" height="418" alt="image"
src="https://github.com/user-attachments/assets/b7cbf57d-a221-468a-9af8-3381981874b2"
/>

### caldera hotel - no entities in view

For this test, I immediately rotated the camera so the hotel was not in
view.

`check_visibility`

This is largely a wash. The old code is 2us faster, but this is likely
in the realm of noise.

<img width="1255" height="372" alt="image"
src="https://github.com/user-attachments/assets/2c795f72-ebea-4b35-ba19-a8eccd4c56fc"
/>

`reset_view_visibility`

This is a big win. The main peak is about 30us faster now, but the major
win is the worst case, which is nearly 500us faster.

<img width="1252" height="398" alt="image"
src="https://github.com/user-attachments/assets/f8a9d99f-a6e1-48db-8639-d52276dba9ab"
/>


`mark_newly_hidden_entities_invisible`

Same story as the other caldera comparison with everything in view, this
is a bit slower than the fast mode, but way faster than the slow mode on
main.

<img width="1248" height="400" alt="image"
src="https://github.com/user-attachments/assets/4564d94e-911a-4b3f-a584-c6e102ef2dd0"
/>
github-merge-queue bot pushed a commit that referenced this pull request Jan 8, 2026
# Objective

- Fix when an entity becomes visible, it latches and is impossible to
hide.
- Introduced in #22226 

## Solution

- Ensure we clear the current state bit when shifting
- Add regression test
cart pushed a commit that referenced this pull request Jan 9, 2026
# Objective

- Fix when an entity becomes visible, it latches and is impossible to
hide.
- Introduced in #22226 

## Solution

- Ensure we clear the current state bit when shifting
- Add regression test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times P-Regression Functionality that used to work but no longer does. Add a test for this! S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Major perf regression in d00240d

5 participants