-
-
Notifications
You must be signed in to change notification settings - Fork 35.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Matrices update optimization (Introducing .needsMatrixUpdate) #14138
Conversation
More precisely, even now we can do this optimization in user code like
With this PR, this optimization will be automatically done. |
My team and I are super grateful that you're working on this. Thanks. This change makes me wonder what is the intended use case of the Is there any reason for callers to pass |
I suggest a different approach similar to Babylon.js: Caching World Matrices. With such a system, even |
Would you please explain a bit more? I think introducing |
I like the idea of the mentioned |
Like this?
|
@mquander I'm thinking the same thing. Probably we should consider whether we really need |
It is great you are interested in this issue.
For example, object.parent.rotation.x += 1;
object.updateMatrixWorld(); This is a long-standing issue. See #6448 (comment). It may be beneficial to have a look at the history of this topic. |
Hey @takahirox! It's great to see that someone is working on this. I was thinking about looking into matrix update optimizations, as well! I'd be happy to help if you're looking for someone to contribute. I'm sure you already have some of this in mind (and have a good portion of it in, already), but here's how I was thinking about the optimization: Assumptions
Implementation
It means that you should only have to call "updateWorldMatrix" on a child and know immediately whether or not it's up to date. This also means that the renderer will only have to call "updateWorldMatrix" on the meshes it wants to update rather than updating the full tree (or nodes that don't even need to be updated). @Mugen87 I didn't know Babylon did that. Sounds like it would optimize for the case where you move something and then it moves back, but wouldn't you still have to check everything to see if it has needs to be updated? This approach would let you queue up only the things that need to be changed (rather than iterating over everything). I may be misunderstanding, though. A couple other related thoughts:
|
Could we maybe consider some kind of an API where we would have I did not dive too deep into this, but it seems like it would make sense. Before would be a time where one sets Is there a pattern here that could be established? |
I've actually wrote something like this into my game engine but had to disable it as it was too buggy. I never really went back to debug it (huge pita). So yea, SUPER happy to see this potentially being adapted into three.js! |
I tried a cache approach. Much less changes but less performance gain. Roughly the performance gain is half or more less from this PR. I think 10 times comparing
Which one do you prefer? I have two options in my mind. First is this PR because I wanna improve the performance of the hot-spot function as much as possible. Second one is the cache approach. If user want further optimization user can set |
I vote for the cache approach. |
Thanks for sharing your idea. I wanna focus on the current behavior of
as @WestLangley explained. I'd like to discuss in another thread for calculating world matrix of a node. |
How about adopting the caching approach now because the change is very small, and we'll consider further optimization like this PR idea when users request further improvement? Any thoughts? Or any better solutions? /cc @mrdoob |
Out of curiosity, besides less changes, what is the advantage of the 2nd PR vs the 1st one? |
I ran the benchmark test again. I ran each test five times and took the average. (Of course we need more samples for more precise numbers.) .updateMatrixWorld( true ) Simulating the case where only a few nodes near the top are updated.
Chrome
(num) indicates the performance gain/loss, where (needsMatrixUpdate or cache) / current. FireFox Nightly
.updateMatrixWorld( undefined ) Simulating the case where no node is updated.
Chrome
FireFox Nightly
overhead Simulating the case where all nodes are updated.
The both two ideas have overhead on update. The Chrome
FireFox Nightly
|
@titansoftime One of the disadvantages of this PR against the cache approach is every update of
this code invokes callback call 10 times though the second or later lines ideally don't need to because the first line sets |
@takahirox Out of curiosity: Can you please explain how you determine the mentioned performance metric ([ops/sec])? A common approach for benchmarking would be useful for the further development process since the results between users and PRs are more comparable. |
I just ran |
Oh, okay 😊 |
@takahirox This is great! I think my vote is for adding the dirty flag and "change" callback on vectors if only because I think it opens the door to future optimizations and cleaner (external) development experience. I do see that the code is a bit more complicated, though. In my opinion it seems worth it. It seems like it would benefit all kinds of use cases with THREE. I've also had to implement this in the past and side-stepped a lot of the THREE.js matrix update logic was too heavy I don't think cache checking would have been quite enough for that case. If you've moved a bunch of stuff in the scene, for example, and you want to know where one specific item is in world space, you'd still have to always climb the parents to ensure their world matrices have been updated before you can update the child you care about, which means a lot of matrix cache checks. It would be great if this could happen seamlessly with THREE rather than having to manually traverse the scene, which a dirty flag could quickly enable.
That does seem like the big downside of this approach. The documentation could call out that using |
I'm surprised by the size of the performance hit in the Vector-change-callback case where all nodes are updated. Fundamentally, the extra work is just flipping one flag on update and checking it later. That doesn't seem like it should add up to 10%. I don't know enough about IonMonkey or V8 JIT to comment on whether another approach to the same thing (e.g. where the function call is eliminated, or where the flag is stored in the |
I tried to add dirty flag to
I ran benchmark test and compared the results. As I expected, dirty flag is faster than the cache approach and less overhead than the callback and cache approach. I wrote only performance gain/loss (= new idea / current) the below. .updateMatrixWorld( true ) Simulating the case where only a few nodes near the top are updated.
Chrome
FireFox Nightly
.updateMatrixWorld( undefined ) Simulating the case where no node is updated.
Chrome
FireFox Nightly
overhead Simulating the case where all nodes are updated.
Chrome
FireFox Nightly
|
@takahirox that's good to know, thanks for sharing! |
SIMD.js is deprecated but it will be implemented on WASM though, but imo that should be a different problem to discuss than the current approach. I like the idea of introducing the I also like the proposal from @Mugen87 on introducing cache/isSync similar to the babylon's approach, I think it has low impact on the code, it's easy to follow and understand and could help getting the work done and avoiding a lot of callback when an attribute is changed on the pos/sca/rot |
I think we should move it forward, too. I'll write and post the summary of my performance evaluation soon because the discussion here is a bit too long. |
I think we have three options now. Refer to #14138 (comment) for my benchmark test. Performance numbers below are of
Renaming Pros:
Cons:
objectMatrix: function () {
if ( ! this.positionCache.equals( this.position ) ||
! this.quaternionCache.equals( this.quaternion ) ||
! this.scaleCache.equals( this.scale ) ) {
this.matrix.compose( this.position, this.quaternioin, this.scale );
this.matrixWorldNeedsUpdate = true;
this.positionCache.copy( this.position );
this.quaternionCache.copy( this.quaternion );
this.scaleCache.copy( this.scale );
}
} Pros:
Cons:
objectMatrix: function () {
if ( this.position.dirty || this.quaternion.dirty || this.scale.dirty ) {
this.matrix.compose( this.position, this.quaternioin, this.scale );
this.matrixWorldNeedsUpdate = true;
this.position.dirty = false;
this.quaternion.dirty = false;
this.scale.dirty = false;
}
} // Vector3/Quaternion extension
function Vector3WithDirtyFlag( x, y, z ) {
this._x = 0;
this._y = 0;
this._z = 0;
Vector3.call( this, x, y, z );
this.dirty = false;
}
Vector3WithDirtyFlag.prototype = Object.assign( Object.create( Vector3.prototype ), {
constructor: Vector3WithDirtyFlag,
isVector3WithDirtyFlag: true
} );
Object.defineProperties( Vector3WithDirtyFlag.prototype, {
x: {
get: function () { return this._x; },
set: function ( value ) {
this._x = value;
this.dirty = true;
}
},
....
} );
function Object3D() {
....
this.position = new Vector3WithDirtyFlag();
....
}
// or Object.defineProperties per vector3/quaternion instance.
// We don't need to make sub class
function Object3D() {
....
this.position = new Vector3();
this.position.dirty = false;
Object.defineProperties( this.position, {
x: {
....
},
....
}
....
} Pros:
Cons:
Appendix: We can break
https://github.com/mrdoob/three.js/blob/dev/src/math/Matrix4.js#L741-L775 With cache or dirty flag, we can introduce partial local matrix update optimization. I need to test to know if it has performance benefit tho. |
My current thought...
|
I made |
I made performance test, where In the test a scene has FPS numbers gain/loss on my Windows chrome where
FPS numbers gain/loss on my Windows FireFox Nightly where
Off topics.
|
I implemented a variant of your dirty flag method into my build about a year ago or so. Couldn't be happier with the results. I noticed a similar drag in projectObject, glad someone is looking into it =] |
What do you folks think of my comment #14138 (comment) especially
? |
@titansoftime What do you think of the cache approach? Is the performance gain not good enough or is performance penalty big in your case? |
I honestly haven't tried it. Dirty flag worked better than I anticipated so I never looked back. I have at least 50 static and 10-20 dynamic objects on the screen at any given time, so other users may have different results. |
Thanks. I'd be happy if you try the cache approach #15706. I'd like to see how it looks in an actual application. |
I unfortunately cannot use the latest three build, since they killed JSONLoader (LegacyJSONLoader does not work for me as initBones is removed). Working on converting all my models to gltf, but there are some serious issues in the blender exporter. Anyway that's a whole different thing, sigh. Do you have a build with the cache approach from pre... ehh I think r100 is where they killed the JSONLoader. |
I think you can locally build like this (there may be easier way). $ git checkout r99
$ git remote add upstream https://github.com/mrdoob/three.js.git
$ git fetch upstream '+refs/pull/15706/head:refs/remotes/pr/15706'
$ git checkout -b r99withCache
$ git cherry-pick d535ef16fe49f5de20144c5906cffdddfce9461f
$ npm run build |
Ah, nice. Will try it out. Note - Haven't forgotten this. My PC is still connected to my main t.v. for Resident Evil 2. Should beat that tonight and go back into dev mode this weekend =] |
As we discussed in #15706 (comment) #14543 (comment), both cache and dirty flag approach seem to have functionality problem, b.updateMatrix(); // Clean
b.matrix.copy( a.matrix ); // manually update local matrix
b.updateMatrix(); // Local matrix should be updated from position/quaternion/scale
// but no update because position/quaternion/scale are clean This is from our flexible API that user can directly access local matrix. So we also need to detect local matrix update. But I don't think we can do simply and performantly. In long term, we might want to change the API, local matrix is read only or hidden from users and allowing users to update objects transform only via But in short term, I don't think it's doable because it can break a lot of core/example/user code. So keep engaging https://threejs.org/docs/#manual/en/introduction/Matrix-transformations Using user defined matrices update function may be another option for users #15706 (comment) |
Correct. |
Thus, keeping this PR open makes no sense since it can't be merged because of the current status of |
#14044 (comment)
This PR is for improving
.updateMatrixWorld()
performance.The idea is
.needsMatrixUpdate
..needsMatrixUpdate
true
when.position/quaternion/scale
is updated..needsMatrixUpdate
istrue
,.updateMatrixWorld()
calls.updateMatrix()
.In short, we don't call
.updateMatrix()
if we don't need to.(I haven't tried lazy update yet)
The changes are
.onChange
toVector3
..needsMatrixUpdate
true inonChange
callback ofVector3/Quaternion/Euler
..needsMatrixUpdate
istrue
,.updateMatrixWorld()
calls.updateMatrix()
.The disadvantage of this idea is the code could be a bit complex. The purpose of this PR is to see how big/small changes are and how hard/easy to maintain, and plus, to see the benefit.
UpdateMatrix benchmark Performance numbers. (
.updateMatrixWorld( true )
)On my Windows + Chrome
On my Windows + FireFox Nightly
UpdateMatrix benchmark Performance numbers. (
.updateMatrixWorld( undefined )
)Additional benefit is
.updataMatrixWorld( false or undefined )
can be much faster if local matrix and parent's world matrix aren't updated. Simply we can skip multiplication for world matrix calculation because both.needsMatrixUpdate
and.needsMatrixWorldUpdate
arefalse
.On my Windows + Chrome
On my Windows + FireFox Nightly