[WIP] Matrices update optimization (Introducing .needsMatrixUpdate) #14138

takahirox · 2018-05-24T21:23:28Z

#14044 (comment)

This PR is for improving .updateMatrixWorld() performance.

The idea is

Introduce .needsMatrixUpdate.
Set .needsMatrixUpdate true when .position/quaternion/scale is updated.
Only when .needsMatrixUpdate is true, .updateMatrixWorld() calls .updateMatrix().

In short, we don't call .updateMatrix() if we don't need to.

(I haven't tried lazy update yet)

The changes are

Add .onChange to Vector3.
Set .needsMatrixUpdate true in onChange callback of Vector3/Quaternion/Euler.
Only when .needsMatrixUpdate is true, .updateMatrixWorld() calls .updateMatrix().

The disadvantage of this idea is the code could be a bit complex. The purpose of this PR is to see how big/small changes are and how hard/easy to maintain, and plus, to see the benefit.

UpdateMatrix benchmark Performance numbers. (.updateMatrixWorld( true ) )

On my Windows + Chrome

graph	old [ops/sec]	new [ops/sec]	new / old
depth=100, breadth=1	108746	136512	1.26
depth=3, breadth=10	9784	11619	1.19
depth=9, breadth=3	125	141	1.13

On my Windows + FireFox Nightly

graph	old [ops/sec]	new [ops/sec]	new / old
depth=100, breadth=1	50817	68536	1.35
depth=3, breadth=10	4633	7285	1.57
depth=9, breadth=3	124	167	1.35

UpdateMatrix benchmark Performance numbers. (.updateMatrixWorld( undefined ) )

Additional benefit is .updataMatrixWorld( false or undefined ) can be much faster if local matrix and parent's world matrix aren't updated. Simply we can skip multiplication for world matrix calculation because both .needsMatrixUpdate and .needsMatrixWorldUpdate are false.

On my Windows + Chrome

graph	old [ops/sec]	new [ops/sec]	new / old
depth=100, breadth=1	108746	542296	4.99
depth=3, breadth=10	9784	53359	5.45
depth=9, breadth=3	125	441	3.53

On my Windows + FireFox Nightly

graph	old [ops/sec]	new [ops/sec]	new / old
depth=100, breadth=1	50817	345471	6.80
depth=3, breadth=10	4633	48427	10.45
depth=9, breadth=3	124	528	4.26

takahirox · 2018-05-24T21:49:41Z

More precisely, even now we can do this optimization in user code like

// object creatation
var object = new THREE.Object3D();
object.matrixAutoUpdate = false;

// animation
object.position.x = newX;
object.updateMatrix();

With this PR, this optimization will be automatically done.

mqp · 2018-05-25T00:03:24Z

My team and I are super grateful that you're working on this. Thanks.

This change makes me wonder what is the intended use case of the force parameter to updateMatrixWorld. There are a number of places in the three source which pass true where I would not have thought to pass it -- for example, in Object3D.getWorldPosition and Object3D.getWorldQuaternion. Changes like this would make a bigger and bigger difference between the performance of passing true unnecessarily (which would be the ordinary speed) and not doing so (potentially very fast.)

Is there any reason for callers to pass true, unless they have manually manipulated world matrices? If not, should places like this be changed?

Mugen87 · 2018-05-25T08:46:37Z

I suggest a different approach similar to Babylon.js: Caching World Matrices. With such a system, even matrixAutoUpdate would be redundant.

https://books.google.de/books?id=6crECQAAQBAJ&pg=PA151&lpg=PA151&dq=babylons.js+cache+matrix&source=bl&ots=Jr6k7XRdMz&sig=8tOF53JDoT1OiK9BK9xOlrsaWJQ&hl=de&sa=X&ved=0ahUKEwilzayVvKDbAhVhEpoKHUnpCT0Q6AEITTAE#v=onepage&q=babylons.js%20cache%20matrix&f=false

takahirox · 2018-05-25T16:44:04Z

Would you please explain a bit more?

I think introducing .needsMatrixUpdate is almost matrix/matrixWorld cache because they're calculated only when necessary.

Mugen87 · 2018-05-25T19:42:23Z

I like the idea of the mentioned isSynchronized() method. So you keep information like position, rotation and scale in an internal cache and only return true if the cached data are equal with the real properties. In this case, you know that the cached matrix represents the actual transformation so you don't need a new calculation. In this way, you don't need to modify Vector3.

takahirox · 2018-05-25T20:11:15Z

Like this?

Object.prototype.isSynchronized = function () {
    return (
        this.position.x === this.cachedPosition.x &&
        this.position.y === this.cachedPosition.y &&
        this.position.z === this.cachedPosition.z &&
        this.quaternion.x === this.cachedQuaternion.x &&
        this.quaternion.y === this.cachedQuaternion.y &&
        this.quaternion.z === this.cachedQuaternion.z &&
        this.quaternion.w === this.cachedQuaternion.w &&
        this.scale.x === this.cachedScale.x &&
        this.scale.y === this.cachedScale.y &&
        this.scale.z === this.cachedScale.z
    )
}

Object.prototype.updateMatrixWorld = function () {

    if ( ! this.isSynchronized() ) this.updateMatrix();

    ....

}

takahirox · 2018-05-25T20:24:13Z

@mquander I'm thinking the same thing. Probably we should consider whether we really need force true for each .updateMatrixWorld() call.

WestLangley · 2018-05-25T23:13:41Z

@takahirox

It is great you are interested in this issue.

updateMatrixWorld() was written to be applied to a scene, and traverses the scene graph in an efficient manner. It is not the correct method to be used in the case of scene node.

For example,

object.parent.rotation.x += 1;
object.updateMatrixWorld();

This is a long-standing issue. See #6448 (comment).

It may be beneficial to have a look at the history of this topic.

gkjohnson · 2018-05-26T00:51:05Z

Hey @takahirox!

It's great to see that someone is working on this. I was thinking about looking into matrix update optimizations, as well! I'd be happy to help if you're looking for someone to contribute.

I'm sure you already have some of this in mind (and have a good portion of it in, already), but here's how I was thinking about the optimization:

Assumptions

If an objects localMatrix changes, then the worldMatrix needs updating.
If an objects worldMatrix changes, then all of the childrens (and childrens childrens) world matrices needs updating.
If an objects worldMatrix is being updated, then its parents worldMatrix needs to be made up to date.

Implementation

If an objects localMatrix needs updating, then the worldMatrix should be marked as needing an update.
If the worldMatrix is marked as needing an update, then mark all children as needing their world matrices updated.
If an objects worldMatrix is already marked as needing an update, then all of its children should already be marked.
If a call to "updateWorldMatrix" is made, then it should call its parent's "updateWorldMatrix" function to.
The "updateWorldMatrix" function doesn't need to run if the world matrix does not need an update.

It means that you should only have to call "updateWorldMatrix" on a child and know immediately whether or not it's up to date. This also means that the renderer will only have to call "updateWorldMatrix" on the meshes it wants to update rather than updating the full tree (or nodes that don't even need to be updated).

@Mugen87 I didn't know Babylon did that. Sounds like it would optimize for the case where you move something and then it moves back, but wouldn't you still have to check everything to see if it has needs to be updated? This approach would let you queue up only the things that need to be changed (rather than iterating over everything). I may be misunderstanding, though.

A couple other related thoughts:

Adding the ability to register for when something has had its transform changed or marked as changed would be nice (and would enable some nice new extensions to THREE)
A similar approach could be used to more optimally create and update bounding boxes on internal tree nodes, which would be nice to have, as well.

pailhead · 2018-05-29T19:45:51Z

Could we maybe consider some kind of an API where we would have onBeforeUpdate and onAfterUpdate. I suggested this back when onBeforeRender got accepted.

I did not dive too deep into this, but it seems like it would make sense. Before would be a time where one sets myMesh.position, after would be where you have the matrices computed. These would be triggered when scene updates the graph.

Is there a pattern here that could be established?

titansoftime · 2018-06-04T19:24:15Z

I've actually wrote something like this into my game engine but had to disable it as it was too buggy. I never really went back to debug it (huge pita).

So yea, SUPER happy to see this potentially being adapted into three.js!

takahirox · 2018-06-07T11:04:21Z

I tried a cache approach. Much less changes but less performance gain. Roughly the performance gain is half or more less from this PR. I think 10 times comparing positioin.xyz, quaternion.xyzw, scale.xyz isn't so fast.

updateMatrix: function () {

    if ( ! this.position.equals( this.positionCache ) ||
        ! this.quaternion.equals( this.quaternionCache ) ||
        ! this.scale.equals( this.scaleCache ) ) {

        this.matrix.compose( this.position, this.quaternion, this.scale );

        this.positionCache.copy( this.position );
        this.quaternionCache.copy( this.quaternion );
        this.scaleCache.copy( this.scale );

        this.matrixWorldNeedsUpdate = true;

    }

},

Which one do you prefer? I have two options in my mind. First is this PR because I wanna improve the performance of the hot-spot function as much as possible. Second one is the cache approach. If user want further optimization user can set object.matrixAutoUpdate = false; and call object.updateMatrix(); when necessary by themselves.

Mugen87 · 2018-06-07T11:09:54Z

I vote for the cache approach.

takahirox · 2018-06-07T11:27:01Z

@gkjohnson

Thanks for sharing your idea. I wanna focus on the current behavior of .updateMatrixWorls() so far which is

to be applied to a scene, and traverses the scene graph in an efficient manner.

as @WestLangley explained.

I'd like to discuss in another thread for calculating world matrix of a node.

takahirox · 2018-06-07T11:31:38Z

I vote for the cache approach.

How about adopting the caching approach now because the change is very small, and we'll consider further optimization like this PR idea when users request further improvement?

Any thoughts? Or any better solutions?

/cc @mrdoob

titansoftime · 2018-06-07T16:23:51Z

Out of curiosity, besides less changes, what is the advantage of the 2nd PR vs the 1st one?

takahirox · 2018-06-07T16:45:07Z

I ran the benchmark test again. I ran each test five times and took the average. (Of course we need more samples for more precise numbers.)

.updateMatrixWorld( true )

Simulating the case where only a few nodes near the top are updated.

s.add('Update graph depth=100, breadth=1 (' + nodeCount(rootA) + ' nodes)', function() {
    rootA.updateMatrixWorld(true);
});

Chrome

graph	current [ops/sec]	needsMatrixUpdate [ops/sec]	cache [ops/sec]
depth=100, breadth=1	109580.8	134058.0 (1.22)	113460.8 (1.04)
depth=3, breadth=10	9609.8	11606.8 (1.21)	9476.2 (0.99)
depth=9, breadth=3	132.4	146.0 (1.10)	122.8 (0.93)

(num) indicates the performance gain/loss, where (needsMatrixUpdate or cache) / current.

FireFox Nightly

graph	current [ops/sec]	needsMatrixUpdate [ops/sec]	cache [ops/sec]
depth=100, breadth=1	51301.8	70755.4 (1.38)	57574.6 (1.12)
depth=3, breadth=10	4778.0	6956.2 (1.46)	5234.8 (1.10)
depth=9, breadth=3	104.8	146.0 (1.39)	109.4 (1.04)

.updateMatrixWorld( undefined )

Simulating the case where no node is updated.

s.add('Update graph depth=100, breadth=1 (' + nodeCount(rootA) + ' nodes)', function() {
    rootA.updateMatrixWorld(true);
});

Chrome

graph	current [ops/sec]	needsMatrixUpdate [ops/sec]	cache [ops/sec]
depth=100, breadth=1	109580.8	523200.2 (4.78)	322480.8 (2.94)
depth=3, breadth=10	9609.8	51019.6 (5.31)	27298.8 (2.84)
depth=9, breadth=3	132.4	434.6 (3.28)	238.2 (1.80)

FireFox Nightly

graph	current [ops/sec]	needsMatrixUpdate [ops/sec]	cache [ops/sec]
depth=100, breadth=1	51301.8	349046.6 (6.80)	152019.8 (2.96)
depth=3, breadth=10	4778.0	48701.2 (10.19)	14671.8 (3.07)
depth=9, breadth=3	104.8	527.0 (5.03)	177.6 (1.69)

overhead

Simulating the case where all nodes are updated.

s.add('Update graph depth=100, breadth=1 (' + nodeCount(rootA) + ' nodes)', function() {
    rootA.traverse(function(obj) {
        obj.position.x += 0.1;
    });
    rootA.updateMatrixWorld(true);
});

The both two ideas have overhead on update. The .needsMatrixUpdate one calls callback when position/quaternion/scale is updated. The cache one copies position/quaternion/scale on .updateMatrix().

Chrome

graph	current [ops/sec]	needsMatrixUpdate [ops/sec]	cache [ops/sec]
depth=100, breadth=1	88573.2	74439.8 (0.84)	85957.6 (0.97)
depth=3, breadth=10	7726.6	5821.4 (0.75)	7363.0 (0.95)
depth=9, breadth=3	93.0	66.0 (0.71)	90.2 (0.97)

FireFox Nightly

graph	current [ops/sec]	needsMatrixUpdate [ops/sec]	cache [ops/sec]
depth=100, breadth=1	44721.6	40382.0 (0.90)	36318.2 (0.81)
depth=3, breadth=10	4344.8	3902.2 (0.90)	3445.4 (0.79)
depth=9, breadth=3	77.0	66.0 (0.86)	69.2 (0.90)

takahirox · 2018-06-07T17:57:11Z

@titansoftime One of the disadvantages of this PR against the cache approach is every update of .position/quaternion/scale calls callback. For example,

object.position.x = 0;
object.position.y = 0;
object.position.z = 0;
object.quaternion.x = 0;
object.quaternion.y = 0;
object.quaternion.z = 0;
object.quaternion.w = 0;
object.scale.x = 0;
object.scale.y = 0;
object.scale.z = 0;

this code invokes callback call 10 times though the second or later lines ideally don't need to because the first line sets object.needsMatrixUpdate = true;.

Mugen87 · 2018-06-07T17:57:50Z

@takahirox Out of curiosity: Can you please explain how you determine the mentioned performance metric ([ops/sec])? A common approach for benchmarking would be useful for the further development process since the results between users and PRs are more comparable.

takahirox · 2018-06-07T18:02:07Z

I just ran test/benchmark/benchmarks.html.

Mugen87 · 2018-06-07T18:03:20Z

Oh, okay 😊

gkjohnson · 2018-06-07T18:06:58Z

@takahirox This is great!

I think my vote is for adding the dirty flag and "change" callback on vectors if only because I think it opens the door to future optimizations and cleaner (external) development experience. I do see that the code is a bit more complicated, though. In my opinion it seems worth it. It seems like it would benefit all kinds of use cases with THREE. I've also had to implement this in the past and side-stepped a lot of the THREE.js matrix update logic was too heavy I don't think cache checking would have been quite enough for that case.

If you've moved a bunch of stuff in the scene, for example, and you want to know where one specific item is in world space, you'd still have to always climb the parents to ensure their world matrices have been updated before you can update the child you care about, which means a lot of matrix cache checks. It would be great if this could happen seamlessly with THREE rather than having to manually traverse the scene, which a dirty flag could quickly enable.

One of the disadvantages of this PR against the cache approach is every update of .position/quaternion/scale calls callback. For example,

That does seem like the big downside of this approach. The documentation could call out that using set(x,y,z) is better than individually setting fields when it comes to performance.

mqp · 2018-06-07T22:47:16Z

I'm surprised by the size of the performance hit in the Vector-change-callback case where all nodes are updated. Fundamentally, the extra work is just flipping one flag on update and checking it later. That doesn't seem like it should add up to 10%. I don't know enough about IonMonkey or V8 JIT to comment on whether another approach to the same thing (e.g. where the function call is eliminated, or where the flag is stored in the Vector3 instead of the Object3D) might be faster, though.

takahirox · 2018-06-08T11:42:49Z

I tried to add dirty flag to Vector3. Setting dirty flag should be faster than calling callback on update and also checking dirty flags should be faster than checking cache values. But the change is still as big as this PR.

// Vector3
set: function ( x, y, z ) {

    this._x = x;
    this._y = y;
    this._z = z;

    this.dirty = true;

    return this;

},

....

// Object3D
updateMatrix: function () {

    if ( this.position.dirty === true ||
        this.rotation.dirty === true ||
        this.scale.dirty === true ) {

        this.matrix.compose( this.position, this.quaternion, this.scale );

        this.position.dirty = false;
        this.rotation.dirty = false;
        this.scale.dirty = false;

        this.matrixWorldNeedsUpdate = true;

    }

},

I ran benchmark test and compared the results. As I expected, dirty flag is faster than the cache approach and less overhead than the callback and cache approach. I wrote only performance gain/loss (= new idea / current) the below.

.updateMatrixWorld( true )

Simulating the case where only a few nodes near the top are updated.

s.add('Update graph depth=100, breadth=1 (' + nodeCount(rootA) + ' nodes)', function() {
    rootA.updateMatrixWorld(true);
});

Chrome

graph	needsMatrixUpdate	cache	dirty flag
depth=100, breadth=1	1.22	1.04	1.25
depth=3, breadth=10	1.21	0.99	1.14
depth=9, breadth=3	1.10	0.93	1.08

FireFox Nightly

graph	needsMatrixUpdate	cache	dirty flag
depth=100, breadth=1	1.38	1.12	1.41
depth=3, breadth=10	1.46	1.10	1.46
depth=9, breadth=3	1.39	1.04	1.20

.updateMatrixWorld( undefined )

Simulating the case where no node is updated.

s.add('Update graph depth=100, breadth=1 (' + nodeCount(rootA) + ' nodes)', function() {
    rootA.updateMatrixWorld(true);
});

Chrome

graph	needsMatrixUpdate	cache	dirty flag
depth=100, breadth=1	4.78	2.94	4.19
depth=3, breadth=10	5.31	2.84	3.84
depth=9, breadth=3	3.28	1.80	2.11

FireFox Nightly

graph	needsMatrixUpdate	cache	dirty flag
depth=100, breadth=1	6.80	2.96	5.25
depth=3, breadth=10	10.19	3.07	5.72
depth=9, breadth=3	5.03	1.69	2.35

overhead

Simulating the case where all nodes are updated.

s.add('Update graph depth=100, breadth=1 (' + nodeCount(rootA) + ' nodes)', function() {
    rootA.traverse(function(obj) {
        obj.position.x += 0.1;
    });
    rootA.updateMatrixWorld(true);
});

Chrome

graph	needsMatrixUpdate	cache	dirty flag
depth=100, breadth=1	0.84	0.97	1.02
depth=3, breadth=10	0.75	0.95	1.00
depth=9, breadth=3	0.71	0.97	1.04

FireFox Nightly

graph	needsMatrixUpdate	cache	dirty flag
depth=100, breadth=1	0.90	0.81	0.96
depth=3, breadth=10	0.90	0.79	0.97
depth=9, breadth=3	0.86	0.90	0.99

mrdoob · 2018-12-18T16:40:59Z

@takahirox that's good to know, thanks for sharing!

fernandojsg · 2019-01-21T12:50:17Z

SIMD.js is deprecated but it will be implemented on WASM though, but imo that should be a different problem to discuss than the current approach.

I like the idea of introducing the static=true as it's quite understandable for newcomers and it's a common practise in most engines. Even with this updated I'd still leave it just renaming it from matrixAutoUpdate, we could bypass flags or cache checks with it, it's add extra info about the object behaviour, it could be used by physics...

I also like the proposal from @Mugen87 on introducing cache/isSync similar to the babylon's approach, I think it has low impact on the code, it's easy to follow and understand and could help getting the work done and avoiding a lot of callback when an attribute is changed on the pos/sca/rot

takahirox · 2019-01-22T14:39:34Z

I think we should move it forward, too. I'll write and post the summary of my performance evaluation soon because the discussion here is a bit too long.

takahirox · 2019-01-31T18:21:30Z

I think we have three options now. Refer to #14138 (comment) for my benchmark test. Performance numbers below are of scene.updateMatrixWorld() unit test, not of entire app.

Renaming to .static property

Renaming .matrixAutoUpdate to .static (.matrixAutoUpdate. = ! .static)

Pros:

User friendly name more than .matrixAutoUpdate
No performance penalty

Cons:

Situation doesn't change
- User still needs to manually set flag
- Unnecessarily updating matrix of dynamic object which doesn't move since last frame

cache position/quaternion/scale ([WIP] Add position/quaternion/scale cache into Object3D #15706)

objectMatrix: function () {

    if ( ! this.positionCache.equals( this.position ) ||
        ! this.quaternionCache.equals( this.quaternion ) ||
        ! this.scaleCache.equals( this.scale ) ) {

        this.matrix.compose( this.position, this.quaternioin, this.scale );
        this.matrixWorldNeedsUpdate = true;
        this.positionCache.copy( this.position );
        this.quaternionCache.copy( this.quaternion );
        this.scaleCache.copy( this.scale );

    }

}

Pros:

Easy to understand, implement and maintain
Automatically skipping unnecessary matrix update
Good performance gain for static objects
- 50-200 % gain in case all objects are static in a scene graph

Cons:

Performance penalty for dynamic objects
- Up to 10 times number comparison per an object
- 5 % (on Chrome), 20 % (FireFox) loss in case all objects are dynamic in a graph
- No or small performance benefit even if most of objects are static, if an object near the top of graph is dynamic. (Probably 10 times comparison is as slow as update local matrix)
Performance gain is smaller than dirty flag approach

Dirty flag ([WIP] Dirty Flag for updateMatrix optimization #14543)

objectMatrix: function () {

    if ( this.position.dirty || this.quaternion.dirty || this.scale.dirty ) {

        this.matrix.compose( this.position, this.quaternioin, this.scale );
        this.matrixWorldNeedsUpdate = true;

        this.position.dirty = false;
        this.quaternion.dirty = false;
        this.scale.dirty = false;

    }

}

// Vector3/Quaternion extension
function Vector3WithDirtyFlag( x, y, z ) {

    this._x = 0;
    this._y = 0;
    this._z = 0;

    Vector3.call( this, x, y, z );

    this.dirty = false;

}

Vector3WithDirtyFlag.prototype = Object.assign( Object.create( Vector3.prototype ), {

    constructor: Vector3WithDirtyFlag,
    isVector3WithDirtyFlag: true

} );

Object.defineProperties( Vector3WithDirtyFlag.prototype, {

    x: {

        get: function () { return this._x; },
        set: function ( value ) {

            this._x = value;
            this.dirty = true;

        }

    },

    ....

} );

function Object3D() {

    ....

    this.position = new Vector3WithDirtyFlag();

    ....

}

// or Object.defineProperties per vector3/quaternion instance.
// We don't need to make sub class

function Object3D() {

    ....

    this.position = new Vector3();
    this.position.dirty = false;

    Object.defineProperties( this.position, {

        x: {

            ....

        },

        ....

    }

    ....

}

Pros:

Automatically skipping unnecessary matrix update
Bigger performance gain for static objects than cache approach
- 50-100% gain from cache approach in case all objects are static in a scene
Less performance penalty for dynamic objects than cache approach

Cons:

Kinda hard to maintain because of class extension and/or Object.defineProperties().
Performance penalty for dynamic objects (smaller than cache approach tho)
- getter/setter is invoked when position/quaternion/scale properties are accessed

Appendix:

We can break Matrix.compose() into two parts

updating matrix from quaternion and scale
updating matrix from position

https://github.com/mrdoob/three.js/blob/dev/src/math/Matrix4.js#L741-L775

With cache or dirty flag, we can introduce partial local matrix update optimization. I need to test to know if it has performance benefit tho.

takahirox · 2019-01-31T18:44:53Z

My current thought...

Updating all objects' matrices in a scene each frame, even if they don't move since last frame, as default is very inefficient
Automatically skipping unnecessary matrices update is very useful to users so we should introduce cache or dirty flag approach
Probably performance gain with cache approach is still good tho it's smaller than the one with dirty flag
Probably some or most of objects in a scene are usually don't move so performance penalty of Cache and dirty flag approach practically looks small?
Going with cache approach so far because easy to maintain, and if we realize we need further optimization or performance penalty isn't small, we think of the other options; dirty flag, renaming to .static from matrixAutoUpdate, or reverting?

takahirox · 2019-02-04T13:21:44Z

I made position/quaternion/scale cache approach PR #15706. You can see how small the change is. And you can test the performance.

takahirox · 2019-02-06T15:27:16Z

I made performance test, where scene.updateMatrixWorld() can have performance impact, based on webgl_performance example to see not only unit test performance but also practical application performance.

https://raw.githack.com/takahirox/three.js/UpdateMatrixTest/examples/webgl_performance_matrices_update_cache.html

In the test a scene has num objects and they're randomly placed between +-distance / 2. (Maybe I should've removed Math.random() for the consistent performance comparison tho.) All the objects are dynamic if animation is true. Otherwise they're static.

FPS numbers gain/loss on my Windows chrome where num = 50000 and distance = 50000.

	cache	dirty flag
Performance gain (static)	+14%	+22%
Performance loss (dynamic)	-10%	-5%

FPS numbers gain/loss on my Windows FireFox Nightly where num = 25000 and distance = 50000. (FPS numbers are somehow kinda inconsistent a bit so hard to precisely say tho.)

	cache	dirty flag
Performance gain (static)	+10%	+26%
Performance loss (dynamic)	-8%	-4%

Off topics.

projectObject() (more precisely Frustum.intersectsObject()?) also seems CPU hot spot function. I'll look into it after we've done .updateMatrixWorld() optimization.
In the example, in dynamic objects case onChange of rotation also seems CPU hot spot function. Probably introducing lazy Euler update Replace bi-directional syncing with lazy Euler syncing #14910 with new API can solve.

titansoftime · 2019-02-06T16:01:41Z

I implemented a variant of your dirty flag method into my build about a year ago or so. Couldn't be happier with the results.

I noticed a similar drag in projectObject, glad someone is looking into it =]

takahirox · 2019-02-06T16:35:05Z

What do you folks think of my comment #14138 (comment) especially

Going with cache approach so far because easy to maintain, and if we realize we need further optimization or performance penalty isn't small, we think of the other options; dirty flag, renaming to .static from matrixAutoUpdate, or reverting?

?

takahirox · 2019-02-06T16:36:45Z

@titansoftime What do you think of the cache approach? Is the performance gain not good enough or is performance penalty big in your case?

titansoftime · 2019-02-06T16:53:16Z

I honestly haven't tried it. Dirty flag worked better than I anticipated so I never looked back. I have at least 50 static and 10-20 dynamic objects on the screen at any given time, so other users may have different results.

takahirox · 2019-02-06T17:08:54Z

Thanks. I'd be happy if you try the cache approach #15706. I'd like to see how it looks in an actual application.

titansoftime · 2019-02-06T17:14:39Z

I unfortunately cannot use the latest three build, since they killed JSONLoader (LegacyJSONLoader does not work for me as initBones is removed). Working on converting all my models to gltf, but there are some serious issues in the blender exporter. Anyway that's a whole different thing, sigh.

Do you have a build with the cache approach from pre... ehh I think r100 is where they killed the JSONLoader.

takahirox · 2019-02-06T19:21:40Z

I think you can locally build like this (there may be easier way).

$ git checkout r99
$ git remote add upstream https://github.com/mrdoob/three.js.git
$ git fetch upstream '+refs/pull/15706/head:refs/remotes/pr/15706'
$ git checkout -b r99withCache
$ git cherry-pick d535ef16fe49f5de20144c5906cffdddfce9461f
$ npm run build

titansoftime · 2019-02-06T20:38:57Z

I think you can locally build like this (there may be easier way

Ah, nice. Will try it out.

Note - Haven't forgotten this. My PC is still connected to my main t.v. for Resident Evil 2. Should beat that tonight and go back into dev mode this weekend =]

takahirox · 2019-02-19T16:51:55Z

As we discussed in #15706 (comment) #14543 (comment), both cache and dirty flag approach seem to have functionality problem,

b.updateMatrix();  // Clean
b.matrix.copy( a.matrix ); // manually update local matrix
b.updateMatrix();  // Local matrix should be updated from position/quaternion/scale
                   // but no update because position/quaternion/scale are clean

This is from our flexible API that user can directly access local matrix. So we also need to detect local matrix update. But I don't think we can do simply and performantly.

In long term, we might want to change the API, local matrix is read only or hidden from users and allowing users to update objects transform only via position/quaternion/scale. It may be clearer API.

But in short term, I don't think it's doable because it can break a lot of core/example/user code. So keep engaging .matrixAutoUpdate = false for static or rarely moving objects and manual .updateMatrix() call when their position/quaternion/scale are updated instead so far?

https://threejs.org/docs/#manual/en/introduction/Matrix-transformations

Using user defined matrices update function may be another option for users #15706 (comment)

WestLangley · 2019-02-19T17:04:12Z

So keep engaging .matrixAutoUpdate = false for static or rarely moving objects and manual .updateMatrix() call when their position/quaternion/scale are updated instead so far?

Correct.

Mugen87 · 2019-09-17T11:20:28Z

As we discussed in #15706 (comment) #14543 (comment), both cache and dirty flag approach seem to have functionality problem,

Thus, keeping this PR open makes no sense since it can't be merged because of the current status of three.jss API. Let's reconsider matrix update optimizations iff Object3D.matrix becomes private.

takahirox added 6 commits May 24, 2018 20:28

Add onChange to Vector3

925a887

Add .matrixNeedsUpdate to Object3D

7cf5ac1

Update UpdateMatrixWorld benchmark

47ca0fb

Update .copy() and .toJSON() of Object3D for matrixNeedsUpdate

89fe7dd

Update ObjectLoader for .matrixNeedsUpdate

1422422

Update Object3D test for Vector3 onChangeCallback

d9ef3d7

takahirox mentioned this pull request Feb 4, 2019

[WIP] Add position/quaternion/scale cache into Object3D #15706

Closed

takahirox changed the title ~~[WIP] Introducing .needsMatrixUpdate~~ [WIP] Matrices update optimization (Introducing .needsMatrixUpdate) Feb 6, 2019

takahirox mentioned this pull request Mar 1, 2019

Skip unnecessary matrices update in performance_static example #15862

Merged

Mugen87 closed this Sep 17, 2019

Mugen87 mentioned this pull request Jan 9, 2020

Add 'setMatrix' method to Object3D #18344

Open

Mugen87 mentioned this pull request Jul 30, 2020

Object3D.getWorld*() should call updateWorldMatrix(true) instead of updateMatrixWorld(true) ? #19962

Closed

3 tasks

Mugen87 mentioned this pull request Mar 2, 2021

Rework updateWorldMatrix / scene.autoUpdate #21387

Open

diarmidmackenzie mentioned this pull request Dec 11, 2022

Potential improvement of "real-world" performance of updateMatrixWorld() #25115

Closed

diarmidmackenzie mentioned this pull request Dec 20, 2022

Object3d matrix data new api #25159

Open

Mugen87 mentioned this pull request Dec 10, 2023

Remove force flag and make Object3D.matrixWorldAutoUpdate work properly again #27261

Closed

jiayihu mentioned this pull request May 19, 2024

perf(calcTransformMatrix): replace cache key string with array fabricjs/fabric.js#9887

Merged

CodyJasonBennett mentioned this pull request Jun 1, 2024

Object3D: matrixNeedsUpdate and matrix/orientation optimizations #28534

Draft

[WIP] Matrices update optimization (Introducing .needsMatrixUpdate) #14138

[WIP] Matrices update optimization (Introducing .needsMatrixUpdate) #14138

Conversation

takahirox commented May 24, 2018 • edited Loading

takahirox commented May 24, 2018 • edited Loading

mqp commented May 25, 2018 • edited Loading

Mugen87 commented May 25, 2018 • edited Loading

takahirox commented May 25, 2018 • edited Loading

Mugen87 commented May 25, 2018 • edited Loading

takahirox commented May 25, 2018 • edited Loading

takahirox commented May 25, 2018

WestLangley commented May 25, 2018

gkjohnson commented May 26, 2018 • edited Loading

Assumptions

Implementation

pailhead commented May 29, 2018 • edited Loading

titansoftime commented Jun 4, 2018

takahirox commented Jun 7, 2018

Mugen87 commented Jun 7, 2018

takahirox commented Jun 7, 2018

takahirox commented Jun 7, 2018

titansoftime commented Jun 7, 2018

takahirox commented Jun 7, 2018 • edited Loading

takahirox commented Jun 7, 2018 • edited Loading

Mugen87 commented Jun 7, 2018

takahirox commented Jun 7, 2018

Mugen87 commented Jun 7, 2018

gkjohnson commented Jun 7, 2018

mqp commented Jun 7, 2018 • edited Loading

takahirox commented Jun 8, 2018 • edited Loading

mrdoob commented Dec 18, 2018

fernandojsg commented Jan 21, 2019

takahirox commented Jan 22, 2019

takahirox commented Jan 31, 2019 • edited Loading

takahirox commented Jan 31, 2019 • edited Loading

takahirox commented Feb 4, 2019

takahirox commented Feb 6, 2019 • edited Loading

titansoftime commented Feb 6, 2019

takahirox commented Feb 6, 2019

takahirox commented Feb 6, 2019

titansoftime commented Feb 6, 2019

takahirox commented Feb 6, 2019

titansoftime commented Feb 6, 2019

takahirox commented Feb 6, 2019 • edited Loading

titansoftime commented Feb 6, 2019 • edited Loading

takahirox commented Feb 19, 2019 • edited Loading

WestLangley commented Feb 19, 2019

Mugen87 commented Sep 17, 2019 • edited Loading

takahirox commented May 24, 2018 •

edited

Loading

takahirox commented May 24, 2018 •

edited

Loading

mqp commented May 25, 2018 •

edited

Loading

Mugen87 commented May 25, 2018 •

edited

Loading

takahirox commented May 25, 2018 •

edited

Loading

Mugen87 commented May 25, 2018 •

edited

Loading

takahirox commented May 25, 2018 •

edited

Loading

gkjohnson commented May 26, 2018 •

edited

Loading

pailhead commented May 29, 2018 •

edited

Loading

takahirox commented Jun 7, 2018 •

edited

Loading

takahirox commented Jun 7, 2018 •

edited

Loading

mqp commented Jun 7, 2018 •

edited

Loading

takahirox commented Jun 8, 2018 •

edited

Loading

takahirox commented Jan 31, 2019 •

edited

Loading

takahirox commented Jan 31, 2019 •

edited

Loading

takahirox commented Feb 6, 2019 •

edited

Loading

takahirox commented Feb 6, 2019 •

edited

Loading

titansoftime commented Feb 6, 2019 •

edited

Loading

takahirox commented Feb 19, 2019 •

edited

Loading

Mugen87 commented Sep 17, 2019 •

edited

Loading