Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebGPURenderer: RenderBundle #28347

Merged
merged 39 commits into from
May 23, 2024
Merged

Conversation

RenaudRohlinger
Copy link
Collaborator

@RenaudRohlinger RenaudRohlinger commented May 12, 2024

Related issue: #26876 #26983

Description
WebGPU RenderBundle offers performance benefits and introduces a new approach to batch processing the instructions of our scene, reducing the amount of CPU time spent issuing repeated rendered commands.

This PR adds the WebGPU support for RenderBundle and a new faster Renderer pipeline in order to reduce JS overhead and is the first step towards Threejs Static Scenes.

Using the WebGLBackend also works and will still benefit from a reduction of JS overhead by skipping most of the renderer pipeline code.

Example for 8000 meshes, 23.7ms average in the default renderer and 15.9ms average in the RenderBundle mode (32%+ performance increase):
https://raw.githack.com/renaudrohlinger/three.js/utsubo/feat/render-bundles/examples/webgpu_renderbundle.html

image

This contribution is funded by Utsubo and Plasticity

Copy link

github-actions bot commented May 12, 2024

📦 Bundle size

Full ESM build, minified and gzipped.

Filesize dev Filesize PR Diff
678.9 kB (168.2 kB) 678.9 kB (168.2 kB) +21 B

🌳 Bundle size after tree-shaking

Minimal build including a renderer, camera, empty scene, and dependencies.

Filesize dev Filesize PR Diff
456.9 kB (110.3 kB) 457 kB (110.3 kB) +21 B

examples/jsm/nodes/accessors/NormalNode.js Fixed Show fixed Hide fixed
examples/jsm/nodes/accessors/NormalNode.js Fixed Show fixed Hide fixed
examples/jsm/nodes/accessors/PositionNode.js Fixed Show fixed Hide fixed
examples/webgpu_renderbundle.html Fixed Show fixed Hide fixed
examples/webgpu_renderbundle.html Fixed Show fixed Hide fixed
examples/webgpu_renderbundle.html Fixed Show fixed Hide fixed
examples/webgpu_renderbundle.html Fixed Show fixed Hide fixed
examples/webgpu_renderbundle.html Fixed Show fixed Hide fixed
examples/webgpu_renderbundle.html Fixed Show fixed Hide fixed
@sunag
Copy link
Collaborator

sunag commented May 12, 2024

I'm getting this message of error in the example:

image

I was imagining something like renderer.renderBundle = true, and maybe hash the frustrum to make the update automatic.

@RenaudRohlinger
Copy link
Collaborator Author

RenaudRohlinger commented May 12, 2024

Ok I will investigate on this error today.

Regarding renderer.renderBundle = true what do you think about renderer.recordBundles() which returns an array of renderBundle and renderer.renderBundles(bundle).

const bundle = renderer.recordBundles(scene, camera)

// your anim loop
function animate() {
  // your static scene gets drawn, for example in a video game it would be all the elements in the background
  renderer.renderBundles(bundle)
  // render your complex stuff that needs to be update every frame (or you can have a postprocess pipeline here)
  renderer.render(sceneComplex, camera)
}

I understand that it might seem optimal to be able to record and always execute in one batch all the renderer commands as it sounds like a super optimization, but in reality, it will just end up becoming so restrictive that there is nothing you can do with it. Cherry-picking a specific part of your pipeline to freeze it as you like sounds a lot more useful, in my opinion.

Also I would say that I still prefer a lot more handling this per scene as I initially proposed as it shouldn't change the renderer API much unlike such structure. On top of that @mrdoob seems to like the idea of a static scene:
#26876 (comment)

By the way the example should be fixed! 😄
https://raw.githack.com/renaudrohlinger/three.js/utsubo/feat/render-bundles/examples/webgpu_renderbundle.html

/cc @sunag

@sunag
Copy link
Collaborator

sunag commented May 13, 2024

I found your second idea interesting... maybe something like?

const renderBundle = new RenderBundle( scene, camera );
renderBundle.transparent = false;
// renderBundle.needsUpdate = true;

renderer.renderBundle( renderBundle );

I think RenderBundle should not be seen as being limited to static scenes, as any uniform can be updated before executeBundles using writeBuffer if necessary, it's base optimization is to avoid CPU load by avoiding JS calls.

@RenaudRohlinger
Copy link
Collaborator Author

I like the concept of having a RenderBundle interface. I can update this PR to focus solely on the RenderBundle part for now.

In that case, I will propose another API for the static part in a dedicated pull request.
As mentioned in this PR, it would be great if you could guide me on how you envision the creation of a shared UBO dedicated to the scene (camera, fog), which will be shared among all the objects in the scene per render.

@sunag
Copy link
Collaborator

sunag commented May 13, 2024

As mentioned in this PR, it would be great if you could guide me on how you envision the creation of a shared UBO dedicated to the scene (camera, fog), which will be shared among all the objects in the scene per render.

My hypothesis would be, during render.renderBundle(), generate a renderBundleData from the RenderBundle class, register all RenderObjects perhaps using _handleObjectFunction... and render "normally" without frustrum, in the second rendering try use this._nodes.updateForRender( renderObject ) and this._bindings.updateForRender( renderObject ); for all previously registered objects and then executeBundles, this already saves a lot of JS in the second call onwards, it certainly won't be compatible with others features like backdrop for example.

UBO optimization should be independent of RenderBundle, it is certainly a very important step for performance, some things I have in mind:

In this sense, the current status is still that #27134, sharedUniformGroup() partially works.

@RenaudRohlinger RenaudRohlinger marked this pull request as ready for review May 20, 2024 02:40
@mrdoob
Copy link
Owner

mrdoob commented May 22, 2024

@RenaudRohlinger

Also I would say that I still prefer a lot more handling this per scene as I initially proposed as it shouldn't change the renderer API much unlike such structure. On top of that @mrdoob seems to like the idea of a static scene:
#26876 (comment)

100%

I'm not sure adding renderer.renderBundle() is the way to go...

I still prefer the "static graph" approach better:

const group = new Group();
group.static = true;

const mesh1 = new Mesh( geometry, material );
group.add( mesh1 );

const instances1 = new InstancedMesh( geometry, material );
group.add( instances1 );

We can make it so the renderer only traverses a static group when group.needsUpdate is set to true.

That way the developer is able to update the bundle data when needed.
We can then use matrixWorldNeedsUpdate in the children to control what needs to be updated.
As well as material.needsUpdate = true.

group.needsUpdate = true; // forces the renderer to traverse the children and update internal bundle

mesh1.position.x = Math.random();
mesh1.matrixWorldNeedsUpdate = true; // recomputes child matrices

instances1.setMatrixAt( index, matrix );
instances1.instanceMatrix.needsUpdate = true;

API wise, it would be mostly adding the properties static and needsUpdate to Group.

We can continue serializing the scene graph as usual while letting the developer "flatten" parts of the graph at render time.

@gkjohnson Maybe this approach could also be used for BatchedMesh? Even replace it? 🤔

@nkallen
Copy link
Contributor

nkallen commented May 22, 2024

Re-using Group this way is an nice API approach. It may run into some tension with the way to get maximum performance...

I do think we would typically want to flatten transforms and other uniforms into one flat array for the whole Bundle. For the best performance, the client code would want to do something like

group.setMatrixAt(group.indexOf(mesh1), matrix));

Along those lines, my hope is to push nearly everything projectObject must do per-frame onto the GPU, including visibility testing, layer testing, frustum culling, etc. So I do think you want to do things like

group.setVisibleAt(group.indexOf(mesh1), false);

(Doing a full traverse just to update one matrix or visibility flag is probably undesirable)

Exposing this behavior into group could work but it also might make the API a bit bulky

@sunag
Copy link
Collaborator

sunag commented May 22, 2024

I think it would be better to deal with the group.static in another PR, the most work is related to the way the renderer deals with bindings, the implementation of this PR currently is related to command optimization, and could be modified to work internally with the similar principle if ( object.static === true ) this._renderBundle( group, camera ) during renderer when the bindings are updated to respect the correct binding groups, now including static ones.

@gkjohnson
Copy link
Collaborator

gkjohnson commented May 23, 2024

@gkjohnson Maybe this approach could also be used for BatchedMesh? Even replace it? 🤔

My impression is that Batching and RenderBundles are different techniques and it's valuable to use both together. Here's a quick overview of my understanding:

  • Batching / instancing are useful techniques for reducing draw calls and changes to graphics context state.

  • RenderBundles are a completely unique WebGPU concept designed specifically to avoid the overhead of the validating and marshaling values from Javascript -> native. Basically a series of state commands is "recorded" on the native side so it can be prevalidated and easily "replayed" without the JS -> native overhead.

RenderBundles won't save any draw calls, though. If you record a set of commands that issues 1000 draw calls then 1000 draw calls will still be issued via native code (though faster). So batching saves draw calls, render bundles save JS overhead, and they should be used together.

Sources:

  1. Don's comment from a previous bundle issue
  2. Toji's Mastadon comment on the topic

Since I'm here I'll say I think a more transparent API like Group.static or StaticGroup that implicitly uses RenderBundles would be best. Maybe something like a needsUpdate flag indicating that a render bundle update should happen is needed (ie matrices, materials, etc have changed in the children).


group.setMatrixAt(group.indexOf(mesh1), matrix));
group.setVisibleAt(group.indexOf(mesh1), false);

Explicit calls like this shouldn't be necessary I don't think. These things should be implicitly determined based on the hierarchy state when generating the render bundle, no?

@RenaudRohlinger
Copy link
Collaborator Author

RenaudRohlinger commented May 23, 2024

It seems that we're all in agreement regarding the concepts and direction that the bundle and static techniques should take.

the implementation of this PR currently is related to command optimization, and could be modified to work internally with > the similar principle if ( object.static === true ) this._renderBundle( group, camera ) during renderer when the bindings are > updated to respect the correct binding groups, now including static ones.

As @sunag explained, the newly introduced RenderBundle support is an internal feature, currently exposed to the user for advanced usage. It is on its way to being automatically handled internally in the render pipeline with the Group.static in the _renderScene() API, as mentioned by @mrdoob. I will be working on a new PR following this one to incorporate these changes.

group.setMatrixAt(group.indexOf(mesh1), matrix));
group.setVisibleAt(group.indexOf(mesh1), false);

Explicit calls like this shouldn't be necessary I don't think. These things should be implicitly determined based on the hierarchy state when generating the render bundle, no?

With this RenderBundle PR, it is still possible to update most buffers and uniforms per mesh dynamically, as demonstrated in the live example, where all matrices are dynamically updated.

Another performance optimization to consider is the writeBuffer cost involved with each uniform/buffer update, which will be partially addressed by the single uniform buffer update #27388 PR. /cc @nkallen @gkjohnson

Here's how I see the implementation unfolding:

Part 1 (basic bundle + static pipeline):

Part 2 (more advanced features):

  • Group.static v3: Add a global UBO pipeline (UBO for the camera and scene elements such as fog, etc.) to update only a single UBO per frame, (instead of pre-calculating the modelViewMatrix on the CPU per mesh per frame) and multiply camera matrices by model matrices directly on the GPU instead of on the CPU. (needs RFC: WebGPURenderer prototype single uniform buffer update / pass #27388)
  • RenderBundle v2 + Group.static v4: Support for drawCallIndirect ([WebGPU] drawIndirect and drawIndexedIndirect #28389) in order to handle frustum culling via a compute shader and to facilitate "dynamic" draw calls count while maintaining the bundle's static state.

@RenaudRohlinger
Copy link
Collaborator Author

RenaudRohlinger commented May 23, 2024

I added renderBundle.needsUpdate that will regenerate the bundle in the next render call and also renamed renderer.renderBundle() to renderer._renderBundle().

This PR should now be ready for review @sunag 😊

@nkallen
Copy link
Contributor

nkallen commented May 23, 2024

group.setMatrixAt(group.indexOf(mesh1), matrix));
group.setVisibleAt(group.indexOf(mesh1), false);

Explicit calls like this shouldn't be necessary I don't think. These things should be implicitly determined based on the hierarchy state when generating the render bundle, no?

I am thinking ahead a bit to Part 2 ("more advanced features") which is our usecase.

We have a 3d modeling/editor program. The scene is primarily static (most objects have unchanging geometry and transforms). The camera is moving frequently as the user navigates the scene. But when the user initiates an editing operation, some very small subset of items have visibility flags and transforms change as a result of user edits. These are changing per frame (e.g., onpointermove).

So we do want to be able to change these attributes per frame (transforms and visibility), without rebuilding the render bundle, without traversing all items to update a very few matrixWorlds, and without re-uploading all buffers. The existing "live example" as mentioned above does allow transforms to be updated per frame, but at the cost of iterating through every item in the bundle and spending additional cpu and and memory bandwidth, erasing a good portion of the gains from the optimization. (correct me if I'm mistaken?)

We additionally want to render the scene from multiple camera angles (multiple viewports). So for a given user edit, we will issue (for example) four render calls with four cameras. Thus our hope is to get this usecase supported in such a way as to remove anything O(n) on the CPU from the (per-frame part of) the render pipeline.

I'm not sure if this clarifies anything, but we would want an API vaguely like the following.

const bundle = buildBundle();

onUserEdit(edit => {
    updateUniforms(bundle, edit);
    setNeedsRender(allViewports)
} );

for (const viewport of allViewports)
   viewport.orbitControls.onMove(() => setNeedsRender([viewport]));

@sunag
Copy link
Collaborator

sunag commented May 23, 2024

This PR should now be ready for review @sunag 😊...
Group.static v1: Automatic internal render bundling using the new RenderBundle interface.

@RenaudRohlinger I'm reviewing and did this second part Group.static v1 before merging, I think it will be necessary for the example to in compliance with the next steps.

@sunag
Copy link
Collaborator

sunag commented May 23, 2024

We can continue serializing the scene graph as usual while letting the developer "flatten" parts of the graph at render time.

@mrdoob Let's adopt this strategy, the current PR will be part of this as @RenaudRohlinger commented.

@RenaudRohlinger After this step, I find it interesting to include the management of bindings by group, today we are dealing with bindings in the same group.

passEncoderGPU.setBindGroup( 0, bindGroupGPU );

@sunag sunag merged commit 82b78e7 into mrdoob:dev May 23, 2024
11 checks passed
@sunag
Copy link
Collaborator

sunag commented May 24, 2024

Group.static v2: Improved management of writeBuffer (related #27388 I believe /cc @aardgoose)

About Group.static v2 and binding group, the ideia we have multiples groups it would be exactly to separate the binds group and share between the materials, in this case we could have a group just for the camera.

For example in this case https://webgpu.github.io/webgpu-samples/?sample=renderBundles

I think the implementation of #27388 should be after that.

@RenaudRohlinger
Copy link
Collaborator Author

RenaudRohlinger commented May 24, 2024

Awesome thanks @sunag!

he ideia we have multiples groups it would be exactly to separate the binds group and share between the materials, in this case we could have a group just for the camera.

If I understand correctly, there will be a shared frame buffer (passEncoderGPU.setBindGroup(0, bindingsData.frameBindGroup)) that is distinct from the buffers of the render list (passEncoderGPU.setBindGroup(1, bindingsData.group)). We could use this frameBindGroup for the camera matrices and the scene fog, while also allowing the user to add custom and vital extra data, such as a global float timer.

All the global scene-related data would be bound to index 0, and all the per-object level data and shaders would be bound to index 1 (global stuff bound to 0 in the shaders).

This way, we could render the scene from multiple camera angles in a split-view manner, for example, without having to update anything except that specific buffer.

Or do you have something different in mind?

@sunag
Copy link
Collaborator

sunag commented May 24, 2024

I have this in mind:

  1. The first step is to make the UniformGroupNode generate the groups in NodeBuilder at buffer level and can be accessed through of a function like renderObject.getBindingGroups(). Backend functions like createBindings() should also be group-oriented.

Each uniformGroup( 'name' ) will be an individual group, for example if we have:

// sharedUniformGroup( 'frame' ) // global timer .. let's ignore it for now

  1. sharedUniformGroup( 'camera' )
  2. sharedUniformGroup( 'render' ) // material, fog, toneMapping, etc
  3. uniformGroup( 'object' ) // default group, object matrices, etc
  4. uniformGroup( 'custom' ) // user defined group

Will be:

  1. setBindGroup( 0, bindingsData.groups.camera ) // bindingsData.groups[ 0 ]
  2. setBindGroup( 1, bindingsData.groups.render ) // bindingsData.groups[ 1 ]
  3. setBindGroup( 2, bindingsData.groups.object ) // bindingsData.groups[ 2 ]
  4. setBindGroup( 3, bindingsData.groups.custom ) // bindingsData.groups[ 3 ]

We'll probably have to sort them.

Now any Node can be stored in an individual group at buffer level, as the code part would already be ready here.

Defining a uniform in a group would be very simple, currently e.g:

import { uniform, renderGroup } ...

// the buffer will be updated only once per render call
const myGlobalPosition = uniform( new THREE.Vector( 0, 100, 0 ) ).setGroup( renderGroup );

// or

const customGroup = sharedUniformGroup( 'myGroup' );
const myGlobalPosition = uniform( new THREE.Vector( 0, 100, 0 ) ).setGroup( customGroup );

// custom groups must be updated using `.needsUpdate`
customGroup.needsUpdate = true;

--

  1. After this process we will be able to detect the patterns and share them between materials once shared is true. It would be better to do it in a separate PR. We can use a hash library for this, through the input nodes per group.

In each draw() the renderer will compare whether the previous group is the same as the current one, and will only update if it is different, a bind group of a material would be updated once in the rendering for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants