Cold Specialization #17567

tychedelia · 2025-01-27T22:06:51Z

Cold Specialization

Objective

An ongoing part of our quest to retain everything in the render world, cold-specialization aims to cache pipeline specialization so that pipeline IDs can be recomputed only when necessary, rather than every frame. This approach reduces redundant work in stable scenes, while still accommodating scenarios in which materials, views, or visibility might change, as well as unlocking future optimization work like retaining render bins.

Solution

Queue systems are split into a specialization system and queue system, the former of which only runs when necessary to compute a new pipeline id. Pipelines are invalidated using a combination of change detection and ECS ticks.

The difficulty with change detection

Detecting “what changed” can be tricky because pipeline specialization depends not only on the entity’s components (e.g., mesh, material, etc.) but also on which view (camera) it is rendering in. In other words, the cache key for a given pipeline id is a view entity/render entity pair. As such, it's not sufficient simply to react to change detection in order to specialize -- an entity could currently be out of view or could be rendered in the future in camera that is currently disabled or hasn't spawned yet.

Why ticks?

Ticks allow us to ensure correctness by allowing us to compare the last time a view or entity was updated compared to the cached pipeline id. This ensures that even if an entity was out of view or has never been seen in a given camera before we can still correctly determine whether it needs to be re-specialized or not.

Testing

TODO: Tested a bunch of different examples, need to test more.

Migration Guide

TODO

AssetEvents has been moved into the PostUpdate schedule.

pcwalton

A couple of comments to start with. I'll have more later when I take a closer look.

crates/bevy_pbr/src/material.rs

pcwalton · 2025-01-27T22:18:17Z

crates/bevy_pbr/src/render/mesh.rs

+            .render_lightmaps
+            .get(&entity)
+            .map(|lightmap| lightmap.slab_index);
+        self.shared.lightmap_slab_index = lightmap_slab_index;


Check pass ordering here. Are we sure that render_lightmaps is populated before this runs?

Okay, will check. I also think that Lightmap needs to be added to the invalidation logic.

Looks like we're good, as RenderLightmaps is populated during the extraction phase, and RenderMeshInstanceGpuBuilder::update is called during the collection subphase of the render phase.

pcwalton

This is a more thorough review. I mostly just had some simplification suggestions that you can do if you want. This is great stuff.

crates/bevy_pbr/src/material.rs

crates/bevy_pbr/src/prepass/mod.rs

crates/bevy_sprite/src/mesh2d/material.rs

crates/bevy_sprite/src/mesh2d/mesh.rs

pcwalton · 2025-01-27T23:14:02Z

crates/bevy_pbr/src/render/mesh.rs

+            .render_lightmaps
+            .get(&entity)
+            .map(|lightmap| lightmap.slab_index);
+        self.shared.lightmap_slab_index = lightmap_slab_index;


Looks like we're good, as RenderLightmaps is populated during the extraction phase, and RenderMeshInstanceGpuBuilder::update is called during the collection subphase of the render phase.

IceSentry

Code LGTM. This is a very good direction.

This change is likely to cause unforeseen bugs but at this point the only way to find them out is to merge it and have more people run it. I can't see anything wrong in a review and I didn't see any issues in a bunch of examples I ran locally.

alice-i-cecile · 2025-02-05T18:36:34Z

@tychedelia when you get a chance could you please finish up the migration guide?

This PR makes Bevy keep entities in bins from frame to frame if they haven't changed. This reduces the time spent in `queue_material_meshes` and related functions to near zero for static geometry. This patch uses the same change tick technique that bevyengine#17567 to detect when meshes have changed in such a way as to require re-binning. In order to quickly find the relevant bin for an entity when that entity has changed, we introduce a new type of cache, the *bin key cache*. This cache stores a mapping from main world entity ID to cached bin key, as well as the tick of the most recent change to the entity. As we iterate through the visible entities in `queue_material_meshes`, we check the cache to see whether the entity needs to be re-binned. If it doesn't, then we mark it as clean in the `valid_cached_entity_bin_keys` bitset. At the end, all bin keys not marked as clean are removed from the bins. This patch has a dramatic effect on the rendering performance of most benchmarks, as it effectively eliminates `queue_material_meshes` from the profile. Note, however, that it generally simultaneously regresses `batch_and_prepare_binned_render_phase` by a bit (not by enough to outweigh the win, however). I believe that's because, before this patch, `queue_material_meshes` put the bins in the CPU cache for `batch_and_prepare_binned_render_phase` to use, while with this patch, `batch_and_prepare_binned_render_phase` must load the batches into the CPU cache itself.

This reverts commit 2ea5e9b.

This PR makes Bevy keep entities in bins from frame to frame if they haven't changed. This reduces the time spent in `queue_material_meshes` and related functions to near zero for static geometry. This patch uses the same change tick technique that #17567 uses to detect when meshes have changed in such a way as to require re-binning. In order to quickly find the relevant bin for an entity when that entity has changed, we introduce a new type of cache, the *bin key cache*. This cache stores a mapping from main world entity ID to cached bin key, as well as the tick of the most recent change to the entity. As we iterate through the visible entities in `queue_material_meshes`, we check the cache to see whether the entity needs to be re-binned. If it doesn't, then we mark it as clean in the `valid_cached_entity_bin_keys` bit set. If it does, then we insert it into the correct bin, and then mark the entity as clean. At the end, all entities not marked as clean are removed from the bins. This patch has a dramatic effect on the rendering performance of most benchmarks, as it effectively eliminates `queue_material_meshes` from the profile. Note, however, that it generally simultaneously regresses `batch_and_prepare_binned_render_phase` by a bit (not by enough to outweigh the win, however). I believe that's because, before this patch, `queue_material_meshes` put the bins in the CPU cache for `batch_and_prepare_binned_render_phase` to use, while with this patch, `batch_and_prepare_binned_render_phase` must load the bins into the CPU cache itself. On Caldera, this reduces the time spent in `queue_material_meshes` from 5+ ms to 0.2ms-0.3ms. Note that benchmarking on that scene is very noisy right now because of #17535. ![Screenshot 2025-02-05 153458](https://github.com/user-attachments/assets/e55f8134-b7e3-4b78-a5af-8d83e1e213b7)

# Cold Specialization ## Objective An ongoing part of our quest to retain everything in the render world, cold-specialization aims to cache pipeline specialization so that pipeline IDs can be recomputed only when necessary, rather than every frame. This approach reduces redundant work in stable scenes, while still accommodating scenarios in which materials, views, or visibility might change, as well as unlocking future optimization work like retaining render bins. ## Solution Queue systems are split into a specialization system and queue system, the former of which only runs when necessary to compute a new pipeline id. Pipelines are invalidated using a combination of change detection and ECS ticks. ### The difficulty with change detection Detecting “what changed” can be tricky because pipeline specialization depends not only on the entity’s components (e.g., mesh, material, etc.) but also on which view (camera) it is rendering in. In other words, the cache key for a given pipeline id is a view entity/render entity pair. As such, it's not sufficient simply to react to change detection in order to specialize -- an entity could currently be out of view or could be rendered in the future in camera that is currently disabled or hasn't spawned yet. ### Why ticks? Ticks allow us to ensure correctness by allowing us to compare the last time a view or entity was updated compared to the cached pipeline id. This ensures that even if an entity was out of view or has never been seen in a given camera before we can still correctly determine whether it needs to be re-specialized or not. ## Testing TODO: Tested a bunch of different examples, need to test more. ## Migration Guide TODO - `AssetEvents` has been moved into the `PostUpdate` schedule. --------- Co-authored-by: Patrick Walton <pcwalton@mimiga.net>

This PR makes Bevy keep entities in bins from frame to frame if they haven't changed. This reduces the time spent in `queue_material_meshes` and related functions to near zero for static geometry. This patch uses the same change tick technique that bevyengine#17567 uses to detect when meshes have changed in such a way as to require re-binning. In order to quickly find the relevant bin for an entity when that entity has changed, we introduce a new type of cache, the *bin key cache*. This cache stores a mapping from main world entity ID to cached bin key, as well as the tick of the most recent change to the entity. As we iterate through the visible entities in `queue_material_meshes`, we check the cache to see whether the entity needs to be re-binned. If it doesn't, then we mark it as clean in the `valid_cached_entity_bin_keys` bit set. If it does, then we insert it into the correct bin, and then mark the entity as clean. At the end, all entities not marked as clean are removed from the bins. This patch has a dramatic effect on the rendering performance of most benchmarks, as it effectively eliminates `queue_material_meshes` from the profile. Note, however, that it generally simultaneously regresses `batch_and_prepare_binned_render_phase` by a bit (not by enough to outweigh the win, however). I believe that's because, before this patch, `queue_material_meshes` put the bins in the CPU cache for `batch_and_prepare_binned_render_phase` to use, while with this patch, `batch_and_prepare_binned_render_phase` must load the bins into the CPU cache itself. On Caldera, this reduces the time spent in `queue_material_meshes` from 5+ ms to 0.2ms-0.3ms. Note that benchmarking on that scene is very noisy right now because of bevyengine#17535. ![Screenshot 2025-02-05 153458](https://github.com/user-attachments/assets/e55f8134-b7e3-4b78-a5af-8d83e1e213b7)

alice-i-cecile · 2025-03-25T20:08:21Z

Thank you to everyone involved with the authoring or reviewing of this PR! This work is relatively important and needs release notes! Head over to bevyengine/bevy-website#1989 if you'd like to help out.

tychedelia added 19 commits January 27, 2025 14:03

Start cold-specialization.

1634310

Use ticks in render world.

62e01ac

Forward working.

a8331cb

Simplify extraction.

47aa01d

Move asset events to PostUpdate.

14fdd1f

Clean-up.

1e06fd0

Clean-up.

a4010b6

Start deferred.

dae15a8

Fix prepass.

54a530c

Start shadows.

f296a7b

Shadows.

c251c04

Fix shadows.

6757df7

Remove plugin.

7055630

Cargo fmt.

f83ee15

Fix prepass.

9da87b8

2d.

68fa3f0

Cleanup.

7129629

Remove dep.

f46864d

Updates from rebase.

da672a9

tychedelia added A-Rendering Drawing game state to the screen D-Complex Quite challenging from either a design or technical perspective. Ask for help! S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Jan 27, 2025

github-actions bot mentioned this pull request Jan 27, 2025

17567 bevyengine/bevy-example-runner#102

Closed

pcwalton reviewed Jan 27, 2025

View reviewed changes

alice-i-cecile added C-Performance A change motivated by improving speed, memory usage or compile times M-Needs-Release-Note Work that should be called out in the blog due to impact labels Jan 27, 2025

alice-i-cecile mentioned this pull request Jan 27, 2025

Cold specialization #16420

Closed

github-actions bot mentioned this pull request Jan 27, 2025

17567 bevyengine/bevy-example-runner#103

Closed

pcwalton approved these changes Jan 27, 2025

View reviewed changes

github-actions bot mentioned this pull request Jan 27, 2025

17567 bevyengine/bevy-example-runner#105

Closed

tychedelia added 6 commits January 29, 2025 23:46

Clippy.

86bdb26

Ambiguity.

ad20a91

Ambiguity.

0be3cdd

Ambiguity.

5a00374

Temp fix ambiguity.

f2e6ebb

Fix ambiguity.

e3aff84

IceSentry approved these changes Feb 5, 2025

View reviewed changes

alice-i-cecile added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Feb 5, 2025

alice-i-cecile added this pull request to the merge queue Feb 5, 2025

superdump approved these changes Feb 5, 2025

View reviewed changes

Merged via the queue into bevyengine:main with commit 2ea5e9b Feb 5, 2025
29 checks passed

pcwalton mentioned this pull request Feb 5, 2025

Retain bins from frame to frame. #17698

Merged

HackerFoo added a commit to HackerFoo/bevy that referenced this pull request Feb 6, 2025

Revert "Cold Specialization (bevyengine#17567)"

63b34dd

This reverts commit 2ea5e9b.

HackerFoo added a commit to HackerFoo/bevy that referenced this pull request Feb 7, 2025

Revert "Cold Specialization (bevyengine#17567)"

09c6bc0

This reverts commit 2ea5e9b.

rparrett mentioned this pull request Feb 7, 2025

animated_ui example is broken #17718

Closed

HackerFoo mentioned this pull request Feb 7, 2025

Move specialize_* to QueueMeshes. #17719

Merged

brianreavis mentioned this pull request Feb 15, 2025

Cold Specialization Memory Leaks #17872

Closed

alice-i-cecile mentioned this pull request Mar 25, 2025

Write release notes for PR #17567: Cold Specialization bevyengine/bevy-website#1989

Closed

ElliottjPierce mentioned this pull request Mar 26, 2025

Clean up incomplete migration guides tracking issue bevyengine/bevy-website#2014

Open

16 tasks

Uh oh!

Cold Specialization #17567

Cold Specialization #17567

Uh oh!

Conversation

tychedelia commented Jan 27, 2025

Cold Specialization

Objective

Solution

The difficulty with change detection

Why ticks?

Testing

Migration Guide

Uh oh!

pcwalton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pcwalton Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

tychedelia Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

pcwalton Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

pcwalton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pcwalton Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

IceSentry left a comment

Choose a reason for hiding this comment

Uh oh!

alice-i-cecile commented Feb 5, 2025

Uh oh!

Uh oh!

alice-i-cecile commented Mar 25, 2025

Uh oh!

Uh oh!