Reorder render sets, refactor bevy_sprite to take advantage #9236

james-j-obrien · 2023-07-21T20:42:56Z

This is a continuation of this PR: #8062

Objective

Reorder render schedule sets to allow data preparation when phase item order is known to support improved batching
Part of the batching/instancing etc plan from here: GPU Instancing #89 (comment)
The original idea came from @inodentry and proved to be a good one. Thanks!
Refactor bevy_sprite and bevy_ui to take advantage of the new ordering

Solution

Move Prepare and PrepareFlush after PhaseSortFlush
Add a PrepareAssets set that runs in parallel with other systems and sets in the render schedule.
- Put prepare_assets systems in the PrepareAssets set
- If explicit dependencies are needed on Mesh or Material RenderAssets then depend on the appropriate system.
Add ManageViews and ManageViewsFlush sets between ExtractCommands and Queue
Move queue_mesh*_bind_group to the Prepare stage
- Rename them to prepare_
- Put systems that prepare resources (buffers, textures, etc.) into a PrepareResources set inside Prepare
- Put the prepare_..._bind_group systems into a PrepareBindGroup set after PrepareResources
Move prepare_lights to the ManageViews set
- prepare_lights creates views and this must happen before Queue
- This system needs refactoring to stop handling all responsibilities
- Gather lights, sort, and create shadow map views. Store sorted light entities in a resource
Remove BatchedPhaseItem
Replace batch_range with batch_size representing how many items to skip after rendering the item or to skip the item entirely if batch_size is 0.
queue_sprites has been split into queue_sprites for queueing phase items and prepare_sprites for batching after the PhaseSort
- PhaseItems are still inserted in queue_sprites
- After sorting adjacent compatible sprite phase items are accumulated into SpriteBatch components on the first entity of each batch, containing a range of vertex indices. The associated PhaseItem's batch_size is updated appropriately.
- SpriteBatch items are then drawn skipping over the other items in the batch based on the value in batch_size
A very similar refactor was performed on bevy_ui

Changelog

Changed:

Reordered and reworked render app schedule sets. The main change is that data is extracted, queued, sorted, and then prepared when the order of data is known.
Refactor bevy_sprite and bevy_ui to take advantage of the reordering.

Migration Guide

Assets such as materials and meshes should now be created in PrepareAssets e.g. prepare_assets<Mesh>
Queueing entities to RenderPhases continues to be done in Queue e.g. queue_sprites
Preparing resources (textures, buffers, etc.) should now be done in PrepareResources, e.g. prepare_prepass_textures, prepare_mesh_uniforms
Prepare bind groups should now be done in PrepareBindGroups e.g. prepare_mesh_bind_group
Any batching or instancing can now be done in Prepare where the order of the phase items is known e.g. prepare_sprites

Next Steps

Introduce some generic mechanism to ensure items that can be batched are grouped in the phase item order, currently you could easily have [sprite at z 0, mesh at z 0, sprite at z 0] preventing batching.
Investigate improved orderings for building the MeshUniform buffer
Implementing batching across the rest of bevy

github-actions · 2023-07-21T20:55:20Z

Example contributors failed to run, please try running it locally and check the result.

github-actions · 2023-07-21T22:47:12Z

Example contributors failed to run, please try running it locally and check the result.

github-actions · 2023-07-21T22:58:09Z

Example contributors failed to run, please try running it locally and check the result.

james-j-obrien · 2023-07-21T23:10:19Z

Had to add an additional check in prepare_sprites to ensure that the image asset is prepared. This isn't ideal as we then do the lookup again in queue_sprites in order to create the bind group.

JMS55

Generally looks good to me, but I can really only review bevy_pbr/bevy_render. I'm not familiar with bevy_sprite, bevy_ui, etc.

Also, curious why the QueueMeshes subset exists.

crates/bevy_render/src/lib.rs

crates/bevy_render/src/render_phase/mod.rs

superdump

Mostly looks OK to me. The approach with UI to do batching in Queue looks at first glance like it would break if other plugins queued anything to the TransparentUi render phase such that those items should be sorted between.

Other than that I'd like to see some benchmarks and either decide that performance is good enough that more performance improvements can be made in a follow-up PR to save on blocking merge of this PR, or that more work is needed.

crates/bevy_sprite/src/render/mod.rs

crates/bevy_ui/src/render/mod.rs

superdump · 2023-07-31T12:58:04Z

On my M1 Max, this PR drops bevymark -- 10000 16 performance from ~72fps to ~44fps, so I think we will have to do some work to optimise this before merging as it's too far away. I'll have a look at what I can do to help this along. :)

superdump · 2023-08-03T14:07:06Z

I first made a branch that:

Uses a SparseSet<Entity, ExtractedSprite> to extract sprites into, and then only sprite batches are spawned as entities after batching.
Avoids per-sprite GpuImage lookups, instead doing them once per batch (except for a case where the image is not loaded yet, that would still be per-sprite until a sprite is found whose image is loaded.)

superdump@4e05e65

After that prepare_sprites still needed some more optimisation but it's getting there. I tried some minor changes but didn't make much of a dent on the performance drop.

About 16% slower than main (based on medians, on an M1 Max running bevymark -- 10000 16 to spawn 16 waves of 10k sprites, and running it for 1500 frames as I usually do.)

Some time passed and I wanted to try changing the way the sprite rendering worked, to instead put per-instance data into a GpuArrayBuffer, and then look it up using an instance index calculated from a specially-crafted index buffer where the low 2 bits are the x and y offsets of the quad (00 is bottom left, 01 bottom right, 10 top left, 11 top right) and the high bits contain the instance index. I got this working but it was still quite a bit slower than main.

About 12% slower than main.

In profiles in Xcode I thought I had seen a lot of cache misses, and so I had one more idea to try - get rid of the GpuArrayBuffer of per-instance data, put it in an instance-rate vertex buffer instead, and use an index buffer with just 6 vertices. This had been claimed to be slower than generating a larger index buffer but on the M1 Max...

About 28% frame time reduction vs main!!

I tested on my 5900HS 0 mobile RTX 3080 as well, not as big a boost there:

About an 8%ish frame time reduction, but still good!

So, I propose that this PR be rebased, reviewed, and merged. And that I make a separate PR with this change to using an instance buffer. @james-j-obrien

Oh, one more thing I noted when moving to the instance-rate vertex buffer: it uses a BufferVec of a repr C struct so does not need to do any serialisation into std430 or std140 layouts using encase. This seems to cut out about 50% of the prepare_sprites execution time.

Co-authored-by: Robert Swain <robert.swain@gmail.com>

…ure is not yet available

github-actions · 2023-08-04T07:17:10Z

Example contributors failed to run, please try running it locally and check the result.

james-j-obrien · 2023-08-04T07:43:20Z

Fixed the UI implementation to match that of bevy_sprite. Main caveat here is that I have set the z depth used for the phase sort in TransparentUi to use the stack_index of the corresponding ui node as an f32. This works and still allows for positioning arbitrary non node TransparentUi items behind or in front of the UI, but the value is also arbitrary so open to suggestions if there is a cleaner method I'm not seeing. One alternative is to sort by the usize stack_index instead of using a float at all but that would then also apply to non node TransparentUi items for little upside.

JMS55 · 2023-08-22T01:25:43Z

Can we rename PrepareBuffers to PrepareResources or something more generic?

Render reorder

superdump

Looking pretty good to me. Huge thanks @robtfm for going through all the systems and sets!

crates/bevy_pbr/src/ssao/mod.rs

crates/bevy_render/src/lib.rs

superdump · 2023-08-22T09:34:33Z

crates/bevy_render/src/lib.rs

+    /// The copy of [`apply_deferred`] that runs between [`PrepareResources`](RenderSet::PrepareResources) and ['PrepareBindgroups'](RenderSet::PrepareBindgroups).
+    PrepareResourcesFlush,
+    /// A sub-set within Prepare for constructing bindgroups, or other data that relies on buffers.
+    PrepareBindgroups,


As the wgpu/bevy types are camelCased as BindGroup, I think this should do the same, everywhere. Just commenting this one time.

Suggested change

PrepareBindgroups,

PrepareBindGroups,

I've fixed the this instance, the only other example in the repo is the LineGizmoUniformBindgroup. I could update that as well but it seems sort of tangential to the purpose of this PR.

crates/bevy_render/src/mesh/mod.rs

crates/bevy_render/src/render_asset.rs

superdump · 2023-08-22T10:12:24Z

The new SVG is looking good! :) It feels like it makes more sense.

robtfm

looks good to me. i tested a number of examples and didn't see any issues. couple of minor points but happy to approve as is.

crates/bevy_render/src/lib.rs

crates/bevy_render/src/view/window/mod.rs

JMS55

An absolutely amazing and foundational PR :). This resolves my long standing complaint of "why the heck are we making bind groups in a stage (system set? is that the official term now?) called queue?". Thanks so much for doing this!

JMS55 · 2023-08-16T03:22:11Z

crates/bevy_core_pipeline/src/core_3d/mod.rs

-            self.per_object_binding_dynamic_offset,
-            Reverse(FloatOrd(self.distance)),
-        )
+        Reverse(FloatOrd(self.distance))


Shouldn't we sort by pipeline here? Unrelated to this PR really, though.

I'll do that separately. :)

JMS55 · 2023-08-23T02:02:27Z

crates/bevy_render/src/render_phase/mod.rs

-        for item in &self.items {
-            let draw_function = draw_functions.get_mut(item.draw_function()).unwrap();
-            draw_function.draw(world, render_pass, view, item);
+        let mut index = 0;


Curious what the assembly looks like for this block.

Here's some annotated output from just the main render loop: render.txt
And here's the full asm for the function: render_full.txt
(had to use txt files as github complains about .asm)

There is an extra branch for the zero case but we won't be taking that the majority of the time.

Thanks. Don't think it's a huge problem then. Branch predictor probably handles it fine.

superdump

@james-j-obrien - let's get this merged! Please update to main and ping me on Discord so I can click the button. :)

# Objective - Supercedes #8872 - Improve sprite rendering performance after the regression in #9236 ## Solution - Use an instance-rate vertex buffer to store per-instance data. - Store color, UV offset and scale, and a transform per instance. - Convert Sprite rect, custom_size, anchor, and flip_x/_y to an affine 3x4 matrix and store the transpose of that in the per-instance data. This is similar to how MeshUniform uses transpose affine matrices. - Use a special index buffer that has batches of 6 indices referencing 4 vertices. The lower 2 bits indicate the x and y of a quad such that the corners are: ``` 10 11 00 01 ``` UVs are implicit but get modified by UV offset and scale The remaining upper bits contain the instance index. ## Benchmarks I will compare versus `main` before #9236 because the results should be as good as or faster than that. Running `bevymark -- 10000 16` on an M1 Max with `main` at `e8b38925` in yellow, this PR in red: ![Screenshot 2023-08-27 at 18 44 10](https://github.com/bevyengine/bevy/assets/302146/bdc5c929-d547-44bb-b519-20dce676a316) Looking at the median frame times, that's a 37% reduction from before. --- ## Changelog - Changed: Improved sprite rendering performance by leveraging an instance-rate vertex buffer. --------- Co-authored-by: Giacomo Stevanato <giaco.stevanato@gmail.com>

# Objective - since #9236 queue_mesh_bind_group has been renamed to prepare_mesh_bind_group,but the comment referring to it has not been updated. .

# Objective - Supercedes bevyengine#8872 - Improve sprite rendering performance after the regression in bevyengine#9236 ## Solution - Use an instance-rate vertex buffer to store per-instance data. - Store color, UV offset and scale, and a transform per instance. - Convert Sprite rect, custom_size, anchor, and flip_x/_y to an affine 3x4 matrix and store the transpose of that in the per-instance data. This is similar to how MeshUniform uses transpose affine matrices. - Use a special index buffer that has batches of 6 indices referencing 4 vertices. The lower 2 bits indicate the x and y of a quad such that the corners are: ``` 10 11 00 01 ``` UVs are implicit but get modified by UV offset and scale The remaining upper bits contain the instance index. ## Benchmarks I will compare versus `main` before bevyengine#9236 because the results should be as good as or faster than that. Running `bevymark -- 10000 16` on an M1 Max with `main` at `e8b38925` in yellow, this PR in red: ![Screenshot 2023-08-27 at 18 44 10](https://github.com/bevyengine/bevy/assets/302146/bdc5c929-d547-44bb-b519-20dce676a316) Looking at the median frame times, that's a 37% reduction from before. --- ## Changelog - Changed: Improved sprite rendering performance by leveraging an instance-rate vertex buffer. --------- Co-authored-by: Giacomo Stevanato <giaco.stevanato@gmail.com>

JMS55 reviewed Jul 22, 2023

View reviewed changes

crates/bevy_render/src/lib.rs Outdated Show resolved Hide resolved

crates/bevy_render/src/render_phase/mod.rs Outdated Show resolved Hide resolved

crates/bevy_render/src/render_phase/mod.rs Show resolved Hide resolved

superdump self-requested a review July 22, 2023 10:40

mockersf added the A-Rendering Drawing game state to the screen label Jul 23, 2023

nicopap self-requested a review July 27, 2023 07:08

nicopap added the C-Code-Quality A section of code that is hard to understand or change label Jul 27, 2023

superdump requested a review from robtfm July 27, 2023 14:20

superdump requested changes Jul 29, 2023

View reviewed changes

crates/bevy_sprite/src/render/mod.rs Outdated Show resolved Hide resolved

crates/bevy_sprite/src/render/mod.rs Outdated Show resolved Hide resolved

crates/bevy_sprite/src/render/mod.rs Show resolved Hide resolved

crates/bevy_ui/src/render/mod.rs Show resolved Hide resolved

superdump mentioned this pull request Jul 29, 2023

GPU Instancing #89

Closed

james-j-obrien and others added 10 commits August 3, 2023 17:59

Reorder render sets, prepare now occurs after sort

e62be96

Co-authored-by: Robert Swain <robert.swain@gmail.com>

Cleanup after rebase

3ccb838

Refactor bevy_sprite to remove double sorting

1732ab0

Refactor prepare_sprites to be clearer and more idiomatic

cb66efe

Remove duplicated data in SpriteBatch

6c575d6

Satisfy clippy

8be84b7

Refactor UI rendering enough to restore functionality

2f59700

Satisfy clippy

4b00b96

Add additional check in queue_sprites to not queue sprites whose text…

1cbaff0

…ure is not yet available

Fix bevy_ui and improve bevy_sprite

ec0e7a7

james-j-obrien force-pushed the sprite-refactor branch from 5a614ae to ec0e7a7 Compare August 4, 2023 07:06

james-j-obrien added 2 commits August 4, 2023 00:31

Fix batching logic when clipping ui nodes

4a8c4ce

Fix formatting

6408611

Add back the elusive render_range

31de780

james-j-obrien added 3 commits August 21, 2023 19:15

Merge pull request #1 from robtfm/render-reorder

6cfac8d

Render reorder

Rename PrepareBuffers to PrepareResources

5f4984d

Make use of seen FixedBitSet

436d9b1

superdump requested changes Aug 22, 2023

View reviewed changes

james-j-obrien added 2 commits August 22, 2023 11:14

Improve docs

d6fc11a

Satiate clippy

8e71d83

robtfm approved these changes Aug 22, 2023

View reviewed changes

crates/bevy_render/src/lib.rs Outdated Show resolved Hide resolved

crates/bevy_render/src/view/window/mod.rs Outdated Show resolved Hide resolved

Add additional constraint to PrepareAssets, remove WindowSystem::Prepare

8691ed4

JMS55 approved these changes Aug 23, 2023

View reviewed changes

superdump approved these changes Aug 27, 2023

View reviewed changes

superdump added 2 commits August 27, 2023 16:08

Merge branch 'main' into sprite-refactor

3763790

Fix formatting

e8b5ea3

superdump enabled auto-merge August 27, 2023 14:30

superdump added this pull request to the merge queue Aug 27, 2023

Merged via the queue into bevyengine:main with commit 4f1d9a6 Aug 27, 2023

superdump mentioned this pull request Aug 27, 2023

Use instancing for sprites #9597

Merged

rparrett mentioned this pull request Sep 2, 2023

Text2d is broken #9676

Closed

rparrett mentioned this pull request Sep 20, 2023

Poor performance on basic rectangle benchmark #8100

Closed

rparrett mentioned this pull request Sep 29, 2023

Performance regression in many_buttons since "render-set reorder" #9971

Closed

cart mentioned this pull request Oct 13, 2023

News: Release 0.12 bevyengine/bevy-website#754

Merged

43 tasks

NiseVoid mentioned this pull request Nov 4, 2023

Bevy 0.12 StarArawn/kayak_ui#279

Closed

rparrett mentioned this pull request Nov 6, 2023

Update to Bevy 0.12.0 StarArawn/bevy_ecs_tilemap#488

Closed

re0312 mentioned this pull request Jan 7, 2024

update Outdated comment #11243

Merged

github-merge-queue bot pushed a commit that referenced this pull request Jan 7, 2024

update Outdated comment (#11243)

101037d

# Objective - since #9236 queue_mesh_bind_group has been renamed to prepare_mesh_bind_group,but the comment referring to it has not been updated. .

joshuaoco mentioned this pull request Jan 14, 2024

Update to Bevy v0.13 yopox/bevy_text_mode#1

Closed

Uh oh!

Reorder render sets, refactor bevy_sprite to take advantage #9236

Reorder render sets, refactor bevy_sprite to take advantage #9236

Uh oh!

Conversation

james-j-obrien commented Jul 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Objective

Solution

Changelog

Migration Guide

Next Steps

Uh oh!

github-actions bot commented Jul 21, 2023

Uh oh!

github-actions bot commented Jul 21, 2023

Uh oh!

github-actions bot commented Jul 21, 2023

Uh oh!

james-j-obrien commented Jul 21, 2023

Uh oh!

JMS55 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

superdump left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

superdump commented Jul 31, 2023

Uh oh!

superdump commented Aug 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 4, 2023

Uh oh!

james-j-obrien commented Aug 4, 2023

Uh oh!

JMS55 commented Aug 22, 2023

Uh oh!

superdump left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

superdump Aug 22, 2023

Choose a reason for hiding this comment

Uh oh!

james-j-obrien Aug 22, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

superdump commented Aug 22, 2023

Uh oh!

robtfm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

JMS55 left a comment

Choose a reason for hiding this comment

Uh oh!

JMS55 Aug 16, 2023

Choose a reason for hiding this comment

Uh oh!

superdump Aug 27, 2023

Choose a reason for hiding this comment

Uh oh!

JMS55 Aug 23, 2023

Choose a reason for hiding this comment

james-j-obrien commented Jul 21, 2023 •

edited

Loading

JMS55 left a comment •

edited

Loading

superdump commented Aug 3, 2023 •

edited

Loading

james-j-obrien Aug 23, 2023 •

edited

Loading