Skip to content

Conversation

lawnjelly
Copy link
Member

@lawnjelly lawnjelly commented Apr 24, 2025

This takes the SceneTreeFTI and optimizes it up to eleven.
Benchmarking shows this to be approx 2-10x faster for the physics interpolation.

Introduction

The original implementation for scene tree traversal in SceneTreeFTI was naive but foolproof, I had always intended to write a more optimized version, but didn't want to make the original PR too hard to review / understand.

Optimization is a trade off - it offers better performance, at a cost of readability and complexity, so we normally reserve it for bottleneck areas. The SceneTreeFTI is such a bottleneck area (like rendering or physics), and so some trade offs are made here for performance. Fair warning it is not for the faint of heart.

How it works

Instead of naively traversing the entire scene tree, as nodes are moved we record them in frame xform lists for later processing. We ideally want to sort these nodes by depth, as traversing down from the higher nodes (low depth, close to root) will prevent duplication of branch processing from lower nodes.

Instead of sorting with e.g. quicksort, we maintain a fixed number of depth layers, and just place the node in the corresponding depth layer. Then as we process the nodes on the frame, we can do it in depth order with no sorting required. If we blow past the max layers, this is not a problem, it just might do a little more processing.

Further optimizations done in this PR

  • Nodes that are on the tick list are currently being interpolated, but nodes that are not on the tick list are not requiring interpolation (expensive) so we can just directly substitute their current local xform.
  • Concatenating parent xforms is also expensive. We can reduce the cost of this by noting that a lot of nodes have an identity xform. We test at opportune places for identity xform, and mark a flag on the node. If a node is using identity, we don't need to concat, we can directly copy the parent global xform to be the child nodes' global xform.

Debugging / Testing

3 modes are offered via a (temporary?) project setting:

  • Default (optimized)
  • Legacy (naive whole tree traversal)
  • Debug (alternates between the two methods, and prints stats)

Additionally there are new debugging compile defines:

  • GODOT_SCENE_TREE_FTI_PRINT_TREE - prints the nodes processed.
  • GODOT_SCENE_TREE_FTI_VERIFY - kind of like a unit test, it uses both methods, and tests that the optimized result is the same as the naive full tree result.

Testing / verification code

Verifying the results of the optimized path are the same as the reference path is itself rather complex, and is implemented here as a separate file, which will be compiled out in regular builds.

The tests file contains a duplicate of the traversal code. This makes both easier to read and understand, although it does mean the test would need to be kept in sync with any changes to the regular path if it to still do its job.

I did start by keeping both paths in the same function using #ifdefs, but it was becoming unreadable (for me, let alone reviewers), so on balance I have gone for a separate file. Another alternative would be to remove the testing code from the main Godot repo and do this independently, but it is kind of nice that anyone can easily run the testing just by defining GODOT_SCENE_TREE_FTI_VERIFY.

Example debug logging when DEBUG is set in the project setting

FTI reference nodes traversed : 949, processed : 0, took 35 usec (start)
FTI reference nodes traversed : 949, processed : 40, took 33 usec (end)
FTI optimized nodes traversed : 0, processed : 0, took 0 usec (start)
FTI optimized nodes traversed : 44, skipped 2, processed : 40, took 5 usec (end)

9 nodes moved during frame:
	Armature
	VAimer
	VAimer
	VAimer
	HairAttachment
	BoneAttachment
	WeaponPivotPoint
	WeaponPivotPoint
	WeaponPivotPoint

This shows how many nodes were touched, how many processed, and timings.

Additionally, what should be a very useful function, it lists all nodes that were moved during the frame. When using physics interpolation most nodes should be moved during the physics tick, and moving during the frame is normally a user error unless that branch has been switched to physics_interpolation_mode OFF.

This allows users to quickly track down which nodes in their scene might be causing problems with physics interpolation. Particularly useful when converting existing game projects, especially ones you have not authored.

Notes

  • Same applies in 4.x.
  • It's just possible there is a better folder for SceneTreeFTITests, but I wasn't sure whether putting it in main/tests was a good idea if that folder uses auto-generation of tests (as this is nothing like the unit tests).

@lawnjelly lawnjelly force-pushed the fti_optimize_scene_tree branch from 11a38ed to 65eb3a2 Compare May 24, 2025 17:26
@lawnjelly
Copy link
Member Author

@Calinou if you get a moment would you be able to look at this too?

It's a duplicate of the 4.x PR (aside from naming changes Spatial / Node3D).

Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally on various 3D demos (at 10 tps), it works as expected.

Code looks good to me.

@lawnjelly lawnjelly merged commit c548b21 into godotengine:3.x May 26, 2025
14 checks passed
@lawnjelly
Copy link
Member Author

Thanks!
Let's get this in for some more regular testing before 3.7 dev 1. 👍

@lawnjelly lawnjelly deleted the fti_optimize_scene_tree branch May 26, 2025 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants