You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would propose the sample builds a single GPURenderBundle instance, encoding all 4000 draws, rather than building 4000 GPURenderBundle instances and potentially allowing the browser implementation to coalesce the render bundle draws.
Instead, 1 GPUBuffer and 1 GPUBindGroup can be created. The GPUBindGroup's layout would get created with hasDynamicOffset: true and then the offset into the larger GPUBuffer can be passed to setBindGroup to draw object N of 4000.
This also reduces the number of GPUQueue.writeBuffer calls from 4000 to 1 per frame, a significant performance savings.
Alternatives
A browser implementation could internally optimize these calls, but this requires some level of caching which may or may not be necessary depending on workload and is difficult to predict.
The web engine / website has a bit more insight into how a GPUBuffer will be used and it is more optimal to perform the coalescing at that level.
Using a single BindGroup is feasible, as demonstrated in #28719.
We should revisit this feature, but I recommend to first prioritize the overall performance improvements for the WebGPURenderer as discussed in #29066 (comment). Once those enhancements are in place, we can then focus on advanced optimizations like RenderBundles.
Let's keep this issue open as a reminder to complete the work on RenderBundles.
Description
webgpu_performance_renderbundle shows the benefit render bundles, but instead of creating a single render bundle as done https://webgpu.github.io/webgpu-samples/?sample=renderBundles and encoding all draw commands, a render bundle per object is created. This seems unnecessary.
Solution
I would propose the sample builds a single GPURenderBundle instance, encoding all 4000 draws, rather than building 4000 GPURenderBundle instances and potentially allowing the browser implementation to coalesce the render bundle draws.
Additionally, as shown in https://webgpu.github.io/webgpu-samples/?sample=animometer, it is not necessary to create 4000 GPUBindGroup instances and 4000 GPUBuffer instances.
Instead, 1 GPUBuffer and 1 GPUBindGroup can be created. The GPUBindGroup's layout would get created with
hasDynamicOffset: true
and then the offset into the larger GPUBuffer can be passed tosetBindGroup
to draw object N of 4000.This also reduces the number of GPUQueue.writeBuffer calls from 4000 to 1 per frame, a significant performance savings.
Alternatives
A browser implementation could internally optimize these calls, but this requires some level of caching which may or may not be necessary depending on workload and is difficult to predict.
The web engine / website has a bit more insight into how a GPUBuffer will be used and it is more optimal to perform the coalescing at that level.
Additional context
See https://webgpu.github.io/webgpu-samples/?sample=animometer for optimal usage of GPURenderBundles with GPUBindGroup and GPUBindGroupLayout instances
The text was updated successfully, but these errors were encountered: