-
-
Notifications
You must be signed in to change notification settings - Fork 4k
Add a bindless mode to AsBindGroup
.
#16368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This adds the infrastructure necessary for Bevy to support *bindless resources*. Bindless resources significantly improve rendering performance by reducing `wgpu` and driver overhead considerably.
The generated |
This commit adds support for *multidraw*, which is a feature that allows multiple meshes to be drawn in a single drawcall. `wgpu` currently implements multidraw on Vulkan, so this feature is only enabled there. Multiple meshes can be drawn at once if they're in the same vertex and index buffers and are otherwise placed in the same bin. (Thus, for example, at present the materials and textures must be identical, but see bevyengine#16368.) Multidraw is a significant performance improvement during the draw phase because it reduces the number of rebindings, as well as the amount of driver overhead. This feature is currently only enabled when GPU culling is used: i.e. when `GpuCulling` is present on a camera. Therefore, if you run for example `scene_viewer`, you will not see any performance improvements, because `scene_viewer` doesn't add the `GpuCulling` component to its camera. Additionally, the multidraw feature is only implemented for opaque 3D meshes and not for shadows or 2D meshes. I plan to make GPU culling the default and to extend the feature to shadows in the future. Also, in the future I suspect that polyfilling multidraw on APIs that don't support it will be fruitful, as even without driver-level support use of multidraw allows us to avoid expensive `wgpu` rebindings.
This commit adds support for *multidraw*, which is a feature that allows multiple meshes to be drawn in a single drawcall. `wgpu` currently implements multidraw on Vulkan, so this feature is only enabled there. Multiple meshes can be drawn at once if they're in the same vertex and index buffers and are otherwise placed in the same bin. (Thus, for example, at present the materials and textures must be identical, but see bevyengine#16368.) Multidraw is a significant performance improvement during the draw phase because it reduces the number of rebindings, as well as the amount of driver overhead. This feature is currently only enabled when GPU culling is used: i.e. when `GpuCulling` is present on a camera. Therefore, if you run for example `scene_viewer`, you will not see any performance improvements, because `scene_viewer` doesn't add the `GpuCulling` component to its camera. Additionally, the multidraw feature is only implemented for opaque 3D meshes and not for shadows or 2D meshes. I plan to make GPU culling the default and to extend the feature to shadows in the future. Also, in the future I suspect that polyfilling multidraw on APIs that don't support it will be fruitful, as even without driver-level support use of multidraw allows us to avoid expensive `wgpu` rebindings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the example on Linux and WebGL2 in Firefox. Apart for the need to make yet another allocator this was much less invasive than I expected, and all the code makes sense to me.
Comments addressed. |
I think the CI failure is unrelated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Very high quality PR as expected. It's a pretty big change, but it's nice that it doesn't affect that many user facing apis.
Once this is merged I'll make a PR to take advantage of it for the default Mesh2d material.
`custom_shader_instancing` examples after the bindless change. The bindless PR (bevyengine#16368) broke some examples: * `specialized_mesh_pipeline` and `custom_shader_instancing` failed because they expect to be able to render a mesh with no material, by overriding enough of the render pipeline to be able to do so. This PR fixes the issue by restoring the old behavior in which we extract meshes even if they have no material. * `texture_binding_array` broke because it doesn't implement `AsBindGroup::unprepared_bind_group`. This was tricky to fix because there's a very good reason why `texture_binding_array` doesn't implement that method: there's no sensible way to do so with `wgpu`'s current bindless API, due to its multiple levels of borrowed references. To fix the example, I split `MaterialBindGroup` into `MaterialBindlessBindGroup` and `MaterialNonBindlessBindGroup`, and allow direct custom implementations of `AsBindGroup::as_bind_group` for the latter type of bind groups. To opt in to the new behavior, return the `AsBindGroupError::CreateBindGroupDirectly` error from your `AsBindGroup::unprepared_bind_group` implementation, and Bevy will call your custom `AsBindGroup::as_bind_group` method as before.
This commit makes `StandardMaterial` use bindless textures, as implemented in PR bevyengine#16368. Non-bindless mode, as used for example in Metal and WebGL 2, remains fully supported via a plethora of `#ifdef BINDLESS` preprocessor definitions. Unfortunately, this PR introduces quite a bit of unsightliness into the PBR shaders. This is a result of the fact that WGSL supports neither passing binding arrays to functions nor passing individual *elements* of binding arrays to functions, except directly to texture sample functions. Thus we're unable to use the `sample_texture` abstraction that helped abstract over the meshlet and non-meshlet paths. I don't think there's anything we can do to help this other than to suggest improvements to upstream Naga.
…stom_shader_instancing` examples after the bindless change. (#16641) The bindless PR (#16368) broke some examples: * `specialized_mesh_pipeline` and `custom_shader_instancing` failed because they expect to be able to render a mesh with no material, by overriding enough of the render pipeline to be able to do so. This PR fixes the issue by restoring the old behavior in which we extract meshes even if they have no material. * `texture_binding_array` broke because it doesn't implement `AsBindGroup::unprepared_bind_group`. This was tricky to fix because there's a very good reason why `texture_binding_array` doesn't implement that method: there's no sensible way to do so with `wgpu`'s current bindless API, due to its multiple levels of borrowed references. To fix the example, I split `MaterialBindGroup` into `MaterialBindlessBindGroup` and `MaterialNonBindlessBindGroup`, and allow direct custom implementations of `AsBindGroup::as_bind_group` for the latter type of bind groups. To opt in to the new behavior, return the `AsBindGroupError::CreateBindGroupDirectly` error from your `AsBindGroup::unprepared_bind_group` implementation, and Bevy will call your custom `AsBindGroup::as_bind_group` method as before. ## Migration Guide * Bevy will now unconditionally call `AsBindGroup::unprepared_bind_group` for your materials, so you must no longer panic in that function. Instead, return the new `AsBindGroupError::CreateBindGroupDirectly` error, and Bevy will fall back to calling `AsBindGroup::as_bind_group` as before.
This commit adds support for *multidraw*, which is a feature that allows multiple meshes to be drawn in a single drawcall. `wgpu` currently implements multidraw on Vulkan, so this feature is only enabled there. Multiple meshes can be drawn at once if they're in the same vertex and index buffers and are otherwise placed in the same bin. (Thus, for example, at present the materials and textures must be identical, but see #16368.) Multidraw is a significant performance improvement during the draw phase because it reduces the number of rebindings, as well as the number of drawcalls. This feature is currently only enabled when GPU culling is used: i.e. when `GpuCulling` is present on a camera. Therefore, if you run for example `scene_viewer`, you will not see any performance improvements, because `scene_viewer` doesn't add the `GpuCulling` component to its camera. Additionally, the multidraw feature is only implemented for opaque 3D meshes and not for shadows or 2D meshes. I plan to make GPU culling the default and to extend the feature to shadows in the future. Also, in the future I suspect that polyfilling multidraw on APIs that don't support it will be fruitful, as even without driver-level support use of multidraw allows us to avoid expensive `wgpu` rebindings.
This commit makes `StandardMaterial` use bindless textures, as implemented in PR #16368. Non-bindless mode, as used for example in Metal and WebGL 2, remains fully supported via a plethora of `#ifdef BINDLESS` preprocessor definitions. Unfortunately, this PR introduces quite a bit of unsightliness into the PBR shaders. This is a result of the fact that WGSL supports neither passing binding arrays to functions nor passing individual *elements* of binding arrays to functions, except directly to texture sample functions. Thus we're unable to use the `sample_texture` abstraction that helped abstract over the meshlet and non-meshlet paths. I don't think there's anything we can do to help this other than to suggest improvements to upstream Naga.
This commit makes `StandardMaterial` use bindless textures, as implemented in PR bevyengine#16368. Non-bindless mode, as used for example in Metal and WebGL 2, remains fully supported via a plethora of `#ifdef BINDLESS` preprocessor definitions. Unfortunately, this PR introduces quite a bit of unsightliness into the PBR shaders. This is a result of the fact that WGSL supports neither passing binding arrays to functions nor passing individual *elements* of binding arrays to functions, except directly to texture sample functions. Thus we're unable to use the `sample_texture` abstraction that helped abstract over the meshlet and non-meshlet paths. I don't think there's anything we can do to help this other than to suggest improvements to upstream Naga.
This patch adds the infrastructure necessary for Bevy to support *bindless resources*, by adding a new `#[bindless]` attribute to `AsBindGroup`. Classically, only a single texture (or sampler, or buffer) can be attached to each shader binding. This means that switching materials requires breaking a batch and issuing a new drawcall, even if the mesh is otherwise identical. This adds significant overhead not only in the driver but also in `wgpu`, as switching bind groups increases the amount of validation work that `wgpu` must do. *Bindless resources* are the typical solution to this problem. Instead of switching bindings between each texture, the renderer instead supplies a large *array* of all textures in the scene up front, and the material contains an index into that array. This pattern is repeated for buffers and samplers as well. The renderer now no longer needs to switch binding descriptor sets while drawing the scene. Unfortunately, as things currently stand, this approach won't quite work for Bevy. Two aspects of `wgpu` conspire to make this ideal approach unacceptably slow: 1. In the DX12 backend, all binding arrays (bindless resources) must have a constant size declared in the shader, and all textures in an array must be bound to actual textures. Changing the size requires a recompile. 2. Changing even one texture incurs revalidation of all textures, a process that takes time that's linear in the total size of the binding array. This means that declaring a large array of textures big enough to encompass the entire scene is presently unacceptably slow. For example, if you declare 4096 textures, then `wgpu` will have to revalidate all 4096 textures if even a single one changes. This process can take multiple frames. To work around this problem, this PR groups bindless resources into small *slabs* and maintains a free list for each. The size of each slab for the bindless arrays associated with a material is specified via the `#[bindless(N)]` attribute. For instance, consider the following declaration: ```rust #[derive(AsBindGroup)] #[bindless(16)] struct MyMaterial { #[buffer(0)] color: Vec4, #[texture(1)] #[sampler(2)] diffuse: Handle<Image>, } ``` The `#[bindless(N)]` attribute specifies that, if bindless arrays are supported on the current platform, each resource becomes a binding array of N instances of that resource. So, for `MyMaterial` above, the `color` attribute is exposed to the shader as `binding_array<vec4<f32>, 16>`, the `diffuse` texture is exposed to the shader as `binding_array<texture_2d<f32>, 16>`, and the `diffuse` sampler is exposed to the shader as `binding_array<sampler, 16>`. Inside the material's vertex and fragment shaders, the applicable index is available via the `material_bind_group_slot` field of the `Mesh` structure. So, for instance, you can access the current color like so: ```wgsl // `uniform` binding arrays are a non-sequitur, so `uniform` is automatically promoted // to `storage` in bindless mode. @group(2) @binding(0) var<storage> material_color: binding_array<Color, 4>; ... @Fragment fn fragment(in: VertexOutput) -> @location(0) vec4<f32> { let color = material_color[mesh[in.instance_index].material_bind_group_slot]; ... } ``` Note that portable shader code can't guarantee that the current platform supports bindless textures. Indeed, bindless mode is only available in Vulkan and DX12. The `BINDLESS` shader definition is available for your use to determine whether you're on a bindless platform or not. Thus a portable version of the shader above would look like: ```wgsl #ifdef BINDLESS @group(2) @binding(0) var<storage> material_color: binding_array<Color, 4>; #else // BINDLESS @group(2) @binding(0) var<uniform> material_color: Color; #endif // BINDLESS ... @Fragment fn fragment(in: VertexOutput) -> @location(0) vec4<f32> { #ifdef BINDLESS let color = material_color[mesh[in.instance_index].material_bind_group_slot]; #else // BINDLESS let color = material_color; #endif // BINDLESS ... } ``` Importantly, this PR *doesn't* update `StandardMaterial` to be bindless. So, for example, `scene_viewer` will currently not run any faster. I intend to update `StandardMaterial` to use bindless mode in a follow-up patch. A new example, `shaders/shader_material_bindless`, has been added to demonstrate how to use this new feature. Here's a Tracy profile of `submit_graph_commands` of this patch and an additional patch (not submitted yet) that makes `StandardMaterial` use bindless. Red is those patches; yellow is `main`. The scene was Bistro Exterior with a hack that forces all textures to opaque. You can see a 1.47x mean speedup.  ## Migration Guide * `RenderAssets::prepare_asset` now takes an `AssetId` parameter. * Bin keys now have Bevy-specific material bind group indices instead of `wgpu` material bind group IDs, as part of the bindless change. Use the new `MaterialBindGroupAllocator` to map from bind group index to bind group ID.
…stom_shader_instancing` examples after the bindless change. (bevyengine#16641) The bindless PR (bevyengine#16368) broke some examples: * `specialized_mesh_pipeline` and `custom_shader_instancing` failed because they expect to be able to render a mesh with no material, by overriding enough of the render pipeline to be able to do so. This PR fixes the issue by restoring the old behavior in which we extract meshes even if they have no material. * `texture_binding_array` broke because it doesn't implement `AsBindGroup::unprepared_bind_group`. This was tricky to fix because there's a very good reason why `texture_binding_array` doesn't implement that method: there's no sensible way to do so with `wgpu`'s current bindless API, due to its multiple levels of borrowed references. To fix the example, I split `MaterialBindGroup` into `MaterialBindlessBindGroup` and `MaterialNonBindlessBindGroup`, and allow direct custom implementations of `AsBindGroup::as_bind_group` for the latter type of bind groups. To opt in to the new behavior, return the `AsBindGroupError::CreateBindGroupDirectly` error from your `AsBindGroup::unprepared_bind_group` implementation, and Bevy will call your custom `AsBindGroup::as_bind_group` method as before. ## Migration Guide * Bevy will now unconditionally call `AsBindGroup::unprepared_bind_group` for your materials, so you must no longer panic in that function. Instead, return the new `AsBindGroupError::CreateBindGroupDirectly` error, and Bevy will fall back to calling `AsBindGroup::as_bind_group` as before.
…ne#16427) This commit adds support for *multidraw*, which is a feature that allows multiple meshes to be drawn in a single drawcall. `wgpu` currently implements multidraw on Vulkan, so this feature is only enabled there. Multiple meshes can be drawn at once if they're in the same vertex and index buffers and are otherwise placed in the same bin. (Thus, for example, at present the materials and textures must be identical, but see bevyengine#16368.) Multidraw is a significant performance improvement during the draw phase because it reduces the number of rebindings, as well as the number of drawcalls. This feature is currently only enabled when GPU culling is used: i.e. when `GpuCulling` is present on a camera. Therefore, if you run for example `scene_viewer`, you will not see any performance improvements, because `scene_viewer` doesn't add the `GpuCulling` component to its camera. Additionally, the multidraw feature is only implemented for opaque 3D meshes and not for shadows or 2D meshes. I plan to make GPU culling the default and to extend the feature to shadows in the future. Also, in the future I suspect that polyfilling multidraw on APIs that don't support it will be fruitful, as even without driver-level support use of multidraw allows us to avoid expensive `wgpu` rebindings.
This commit makes `StandardMaterial` use bindless textures, as implemented in PR bevyengine#16368. Non-bindless mode, as used for example in Metal and WebGL 2, remains fully supported via a plethora of `#ifdef BINDLESS` preprocessor definitions. Unfortunately, this PR introduces quite a bit of unsightliness into the PBR shaders. This is a result of the fact that WGSL supports neither passing binding arrays to functions nor passing individual *elements* of binding arrays to functions, except directly to texture sample functions. Thus we're unable to use the `sample_texture` abstraction that helped abstract over the meshlet and non-meshlet paths. I don't think there's anything we can do to help this other than to suggest improvements to upstream Naga.
Thank you to everyone involved with the authoring or reviewing of this PR! This work is relatively important and needs release notes! Head over to bevyengine/bevy-website#1960 if you'd like to help out. |
This patch adds the infrastructure necessary for Bevy to support bindless resources, by adding a new
#[bindless]
attribute toAsBindGroup
.Classically, only a single texture (or sampler, or buffer) can be attached to each shader binding. This means that switching materials requires breaking a batch and issuing a new drawcall, even if the mesh is otherwise identical. This adds significant overhead not only in the driver but also in
wgpu
, as switching bind groups increases the amount of validation work thatwgpu
must do.Bindless resources are the typical solution to this problem. Instead of switching bindings between each texture, the renderer instead supplies a large array of all textures in the scene up front, and the material contains an index into that array. This pattern is repeated for buffers and samplers as well. The renderer now no longer needs to switch binding descriptor sets while drawing the scene.
Unfortunately, as things currently stand, this approach won't quite work for Bevy. Two aspects of
wgpu
conspire to make this ideal approach unacceptably slow:In the DX12 backend, all binding arrays (bindless resources) must have a constant size declared in the shader, and all textures in an array must be bound to actual textures. Changing the size requires a recompile.
Changing even one texture incurs revalidation of all textures, a process that takes time that's linear in the total size of the binding array.
This means that declaring a large array of textures big enough to encompass the entire scene is presently unacceptably slow. For example, if you declare 4096 textures, then
wgpu
will have to revalidate all 4096 textures if even a single one changes. This process can take multiple frames.To work around this problem, this PR groups bindless resources into small slabs and maintains a free list for each. The size of each slab for the bindless arrays associated with a material is specified via the
#[bindless(N)]
attribute. For instance, consider the following declaration:The
#[bindless(N)]
attribute specifies that, if bindless arrays are supported on the current platform, each resource becomes a binding array of N instances of that resource. So, forMyMaterial
above, thecolor
attribute is exposed to the shader asbinding_array<vec4<f32>, 16>
, thediffuse
texture is exposed to the shader asbinding_array<texture_2d<f32>, 16>
, and thediffuse
sampler is exposed to the shader asbinding_array<sampler, 16>
. Inside the material's vertex and fragment shaders, the applicable index is available via thematerial_bind_group_slot
field of theMesh
structure. So, for instance, you can access the current color like so:Note that portable shader code can't guarantee that the current platform supports bindless textures. Indeed, bindless mode is only available in Vulkan and DX12. The
BINDLESS
shader definition is available for your use to determine whether you're on a bindless platform or not. Thus a portable version of the shader above would look like:Importantly, this PR doesn't update
StandardMaterial
to be bindless. So, for example,scene_viewer
will currently not run any faster. I intend to updateStandardMaterial
to use bindless mode in a follow-up patch.A new example,
shaders/shader_material_bindless
, has been added to demonstrate how to use this new feature.Here's a Tracy profile of

submit_graph_commands
of this patch and an additional patch (not submitted yet) that makesStandardMaterial
use bindless. Red is those patches; yellow ismain
. The scene was Bistro Exterior with a hack that forces all textures to opaque. You can see a 1.47x mean speedup.Migration Guide
RenderAssets::prepare_asset
now takes anAssetId
parameter.wgpu
material bind group IDs, as part of the bindless change. Use the newMaterialBindGroupAllocator
to map from bind group index to bind group ID.