Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve support for cluster simplification #704

Merged
merged 13 commits into from
Jun 17, 2024
Merged

Improve support for cluster simplification #704

merged 13 commits into from
Jun 17, 2024

Conversation

zeux
Copy link
Owner

@zeux zeux commented Jun 14, 2024

When simplifying a small subset of the larger mesh, all computations
that go over the entire vertex buffer become expensive; this adds up
even when done once, and especially when done for every pass. This
is a critical part of some workflows that combine clusterization and
simplification, notably Nanite-style virtual geometry renderers.

This change introduces a sparse simplification mode that instructs the
simplifier to optimize based on the assumption that the subset of the
mesh that is being simplified is small. In that case it's worth spending
extra time to convert indices into a small 0..U subrange, do all
internal processing assuming we are working with a small vertex/index
buffer, and remap the indices at the end. While this processing could be
done externally, that is less efficient as it requires constant copying
of position/attribute data; in constrast, we can do it fairly cheaply.

When using sparse simplification, the error is treated as relative to
the mesh subset. This is a performance requirement as computing the full
mesh extents is too expensive when the subset is small relative to the
mesh, but it means that it can be difficult to rely on exact error
metrics.

There are also cases in general, when not using sparse simplification,
when an absolute error is more convenient. These can be achieved right
now via meshopt_simplifyScale but that is an extra step that is not
always necessary.

The new features can be accessed by adding meshopt_SimplifySparse and
meshopt_SimplifyErrorAbsolute bit flags to simplification options.

As an example of a performance delta, the newly added simplifyClusters
demo call takes 17.7 seconds to simplify a 870K triangle mesh one cluster
at a time, with the new sparse mode it takes ~150 msec (100x faster).

zeux added 11 commits June 13, 2024 21:26
This is an important building block for Nanite, as it requires a
repeated application of buildMeshlets => simplify (with additional
algorithms that meshoptimizer currently doesn't support, like grouping
meshlets based on proximity). This can be used to refine future
improvements to simplification like sparsity, as well as serve as a
basic example.

For simplicity we use LockBorders flag instead of locking boundary
vertices manually via meshopt_simplifyWithAttributes; from production
perspective both options are viable.
When simplifying a small subset of the larger mesh, all computations
that go over the entire vertex buffer become expensive; this adds up
even when done once, and especially when done for every pass.

This change introduces a sparse simplification mode that instructs the
simplifier to optimize based on the assumption that the subset of the
mesh that is being simplified is small. In that case it's worth spending
extra time to convert indices into a small 0..U subrange, do all
internal processing assuming we are working with a small vertex/index
buffer, and remap the indices at the end. While this processing could be
done externally, that is less efficient as it requires constant copying
of position/attribute data; in constrast, we can do it fairly cheaply.

We need to take sparse_remap into account for any code that indexes
input position/attribute data; buildPositionRemap needs this as well as
it currently works off of original (unscaled) data.

buildSparseRemap does need to perform O(dense) work if we want to avoid
using hash maps for deduplication/filtering; this change uses a small
zeroed array and a large sparsely-initialized array to reduce the cost.
This is subject to future improvements (although is fairly performant as
it is when simplifying 500-triangle subsets of a 800K triangle mesh).
This reduces the high watermark and may allow reusing the deallocated
space for the actual simplifier state. Ideally we would reduce the space
consumed here further, but that requires a hash map so may not be ideal.
We have two parts of buildSparseRemap that are still dependent on the
number of vertices wrt complexity, filter clearing and revremap
allocation. We could replace revremap with a hash map if a large
allocation proves to be problematic, but it also might work fine in
practice - whereas filter[] clearing is a cost we must pay and for small
subsets of large meshes this can be 20% of the entire simplification.

To fix this we now use a bit set which is 8x cheaper to clear (the
actual addressing gets more expensive but it's a fraction of the cost
due to sparsity assumption).
classifyVertices references the input vertex_lock[] which is indexed
using dense vertex indices so it needs to remap the access index on the
fly as well.
When using sparse simplification, the error is treated as relative to
the mesh subset. This is a performance requirement as computing the full
mesh extents is too expensive when the subset is small relative to the
mesh, but it means that it can be difficult to rely on exact error
metrics.

There are also cases in general, when not using sparse simplification,
when an absolute error is more convenient. These can be achieved right
now via meshopt_simplifyScale but that is an extra step that is not
always necessary.

With this change, when meshopt_SimplifyErrorAbsolute flag is used, we
treat the error limit and the output error as an absolute distance (in
mesh coordinates), and convert it to/from relative using the internal
scale factor.
Previously we couldn't really guarantee a sensible error bound or
display the resulting deviation, but meshopt_SimplifyErrorAbsolute
together with meshopt_simplifyScale makes it easy, so we incorporate
that into simplifyClusters.
We collapse a center vertex which introduces an error and check that the
error is close to what we would expect. Note that the distance the
vertex travels here is 1.0f, not 0.85f, but the errors are evaluated as
distances to triangles which makes it smaller.
Note that this change doesn't rebuild the Wasm bundle; the options
should just work once that is done.
The test needs to check that positions, attributes and lock flags are
all addressed using proper indexing. We do assume that collapses are
going in a specific direction (the input data technically permits two
collapse directions for each test), if this becomes a problem due to fp
instabilities we can tweak the input data then.
@zeux zeux changed the title WIP: Nanite improvements Improve support for cluster simplification Jun 14, 2024
@zeux
Copy link
Owner Author

zeux commented Jun 14, 2024

I've tested this in Bevy using bevyengine/bevy#13431 and the following diff (probably could have left 0.5 factor in but I am not sure it's correct to apply it!), and after that change simplification is barely visible in the profile - the overall process of data preparation is still not very fast because Bevy's meshlet connectivity analysis (find_connected_meshlets) is slow, but I'm sure that can be made faster separately.

patch
diff --git a/crates/bevy_pbr/src/meshlet/from_mesh.rs b/crates/bevy_pbr/src/meshlet/from_mesh.rs
index a5ff00fad..9d95978ee 100644
--- a/crates/bevy_pbr/src/meshlet/from_mesh.rs
+++ b/crates/bevy_pbr/src/meshlet/from_mesh.rs
@@ -58,6 +58,8 @@ impl MeshletMesh {
             .map(|m| m.triangle_count as u64)
             .sum();
 
+        let scale = simplify_scale(&vertices);
+
         // Build further LODs
         let mut simplification_queue = 0..meshlets.len();
         let mut lod_level = 1;
@@ -82,7 +84,7 @@ impl MeshletMesh {
             for group_meshlets in groups.values().filter(|group| group.len() > 1) {
                 // Simplify the group to ~50% triangle count
                 let Some((simplified_group_indices, mut group_error)) =
-                    simplify_meshlet_groups(group_meshlets, &meshlets, &vertices, lod_level)
+                    simplify_meshlet_groups(group_meshlets, &meshlets, &vertices, lod_level, scale)
                 else {
                     continue;
                 };
@@ -287,6 +289,7 @@ fn simplify_meshlet_groups(
     meshlets: &Meshlets,
     vertices: &VertexDataAdapter<'_>,
     lod_level: u32,
+    scale: f32,
 ) -> Option<(Vec<u32>, f32)> {
     // Build a new index buffer into the mesh vertex data by combining all meshlet data in the group
     let mut group_indices = Vec::new();
@@ -299,7 +302,8 @@ fn simplify_meshlet_groups(
 
     // Allow more deformation for high LOD levels (1% at LOD 1, 10% at LOD 20+)
     let t = (lod_level - 1) as f32 / 19.0;
-    let target_error = 0.1 * t + 0.01 * (1.0 - t);
+    let target_error_rel = 0.1 * t + 0.01 * (1.0 - t);
+    let target_error = target_error_rel * scale;
 
     // Simplify the group to ~50% triangle count
     // TODO: Use simplify_with_locks()
@@ -309,7 +313,7 @@ fn simplify_meshlet_groups(
         vertices,
         group_indices.len() / 2,
         target_error,
-        SimplifyOptions::LockBorder,
+        SimplifyOptions::LockBorder | SimplifyOptions::Sparse | SimplifyOptions::ErrorAbsolute,
         Some(&mut error),
     );
 
@@ -318,9 +322,6 @@ fn simplify_meshlet_groups(
         return None;
     }
 
-    // Convert error to object-space and convert from diameter to radius
-    error *= simplify_scale(vertices) * 0.5;
-
     Some((simplified_group_indices, error))
 }

For the above to work, meshopt-rs needs to get two extra enum entries and that's it.

@zeux
Copy link
Owner Author

zeux commented Jun 14, 2024

Going to mark this as ready to merge although I want to look into using a hash map for second part of buildSparseRemap as on Windows the large allocation is not as fast as I'd like it to be (buildSparseRemap accounts for ~30% of cluster simplification there compared to ~2% on Linux).

@zeux zeux marked this pull request as ready for review June 14, 2024 21:52
zeux added 2 commits June 14, 2024 20:17
This helps to test sparse simplification on a large variety of meshes.

Drive-by: fix simplifyPoints under address sanitizer when the number of
points was below the threshold so &indices[0] would be out of bounds.
Instead of using a large uninitialized allocation, we now use a hash
table. Under the assumption that a mesh subset is much smaller, we were
relying on the efficiency of repeatedly reallocating a large
uninitialized segment of memory, which worked well on some systems
(Linux) but not as much on others (Windows). Instead, we now minimize
the allocation size.

This ends up a little slower when the sparsity assumption does not hold,
as we no longer use direct indexing. To minimize the impact, we use a
hash function that reduces the amount of avalanche so that sequential
indices have fewer hash conflicts; it might also make sense to use an
identity function here, or a small multiplicative constant.

To minimize the impact further, we don't store pairs in the map, and
just store the new index; when doing a lookup, due to the unique
construction of the hasher we can lookup the dense index despite the
fact that the map only stores sparse indices.
@zeux zeux merged commit 8c1782a into master Jun 17, 2024
12 checks passed
@zeux zeux deleted the simplify branch June 17, 2024 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant