Skip to content

Meshlet BVH Culling #19318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open

Meshlet BVH Culling #19318

wants to merge 25 commits into from

Conversation

atlv24
Copy link
Contributor

@atlv24 atlv24 commented May 21, 2025

Objective

  • Merge @SparkyPotato 's efforts to implement BVH-accelerated meshlet culling.

Solution

  • Add hot reloading support
  • Fix near-plane overculling
  • Fix hzb sampling
  • Fix orthographic error metric

Testing

  • Meshlet example, Nsight, hot-reloading and careful thinking

@JMS55 JMS55 self-requested a review May 21, 2025 14:53
@atlv24 atlv24 added C-Feature A new feature, making something new possible A-Rendering Drawing game state to the screen S-Needs-Review Needs reviewer attention (from anyone!) to move forward D-Shaders This code uses GPU shader languages labels May 22, 2025
Comment on lines 22 to 29
// let world_sphere_radius = lod_sphere.w * world_scale;
// let norm_error = simplification_error / world_sphere_radius * 0.25;
// return norm_error * view.viewport.w < 1.0;
let world_error = simplification_error * world_scale;
let proj = projection[1][1];
let height = 2.0 / proj;
let norm_error = world_error / height;
return norm_error * view.viewport.w < 1.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure how correct my change (from the commented code) is, though it seems to work.

@JMS55 JMS55 added this to the 0.17 milestone May 27, 2025
Copy link
Contributor

@JMS55 JMS55 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok did my first pass of this.

A lot of it I just skimmed because there's way too much code for me to verify line by line, so I'm just going to trust a lot of the BVH traversal stuff and resource handling is correct.

Here's what I still need to verify:

  1. from_mesh() code
  2. Occlusion culling changes
  3. LOD selection changes
  4. Try it out and test perf (my current computer doesn't support 64bit atomics unfortunately)

write_slice(&asset.meshlet_simplification_errors, &mut writer)?;
write_slice(&asset.meshlet_cull_data, &mut writer)?;
writer.write_all(bytemuck::bytes_of(&asset.aabb))?;
writer.write_all(bytemuck::bytes_of(&asset.bvh_depth))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the instance abbb + bvh_depth to before the FrameEncoder?

"remap_1d_to_2d_dispatch.wgsl",
Shader::from_wgsl
);
embedded_asset!(app, "clear_visibility_buffer.wgsl");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: move clear_visibility_buffer to above cull_instances.

let base_bvh_node_index = (buffer_offset / size as u64) as u32;
for (i, &node) in self.iter().enumerate() {
let bytes = bytemuck::cast::<_, [u8; size_of::<BvhNode>()]>(BvhNode {
aabbs: core::array::from_fn(|i| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pull some of this code out into a new function? It's hard to understand imo.

@@ -233,6 +233,7 @@ pub fn extract_meshlet_mesh_entities(
}

// Iterate over every instance
// TODO: Switch to change events to not upload every instance every frame.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah better instance handling on the CPU is a big todo. I know @tychedelia is also gonna look into caching material specialization for meshlet instances as well.

#import bevy_render::maths::affine3_to_square

// https://github.com/zeux/meshoptimizer/blob/1e48e96c7e8059321de492865165e9ef071bffba/demo/nanite.cpp#L115
fn lod_error_is_imperceptible(lod_sphere: vec4<f32>, simplification_error: f32, instance_id: u32) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: I need to review this carefully

}
};
match &mut resource_manager.cluster_meshlet_ids {
instance_manager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test that this is equally as fast as upload_storage_buffer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, upload_storage_buffer did the same thing internally so I just got rid of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might've added that after writing upload_storage_buffer 😅

@@ -48,63 +74,128 @@ struct DrawIndirectArgs {
first_instance: u32,
}

struct InstancedOffset {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comments here describing what InstancedOffset is, and show some example code of how you map it to cluster/meshlet/triangle/etc?

Comment on lines +161 to +163
if any((max_texel >> vec2(mip)) > (min_texel >> vec2(mip)) + 3) {
mip += 1u;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the code we can remove when we fix non-pot HiZ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that would be the + 1u (and the clamping) above.

}

@compute
@workgroup_size(128, 1, 1) // 8 threads per node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 node is handled by 8 threads, so 16 nodes per WG?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup.

@@ -19,11 +23,13 @@ use meshopt::{
use metis::{option::Opt, Graph};
use smallvec::SmallVec;
use thiserror::Error;
use tracing::debug_span;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: I need to review this whole file, haven't looked at it yet

@JMS55
Copy link
Contributor

JMS55 commented Jun 11, 2025

Fix near-plane overculling
Fix hzb sampling
Fix orthographic error metric

What was wrong with each of these?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Feature A new feature, making something new possible D-Shaders This code uses GPU shader languages S-Needs-Review Needs reviewer attention (from anyone!) to move forward
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

3 participants