Skip to content

Conversation

lawnjelly
Copy link
Member

@lawnjelly lawnjelly commented May 10, 2025

This takes the SceneTreeFTI and optimizes it up to eleven.
Benchmarking shows this to be approx 2-10x faster for the physics interpolation.

Fixes #106185
Forward port of #105728

N.B. existing un-optimized method is still available, making this PR very safe

It is possible to switch between this new optimized implementation and the old reference implementation via project setting physics/3d/physics_interpolation/scene_traversal, which makes this far less risky to introduce as users can always change to the existing method if there are any regressions.

MRP

For the MRP in #106185, fps with physics interpolation ON is now 360fps, matching fps with physics interpolation OFF.

The physics interpolation itself I've timed as > 2500x faster. 👍 😎
That's fairly extreme, but represents the performance improvement in static parts of the scene.

These are the debug log results for the MRP:

FTI reference nodes traversed : 40012, processed : 3, took 4225 usec (start)
FTI reference nodes traversed : 40012, processed : 0, took 3543 usec (end)
FTI optimized nodes traversed : 3, processed : 3, took 3 usec (start)
FTI optimized nodes traversed : 0, processed : 0, took 0 usec (end)

Introduction

The original implementation for scene tree traversal in SceneTreeFTI was naive but foolproof, I had always intended to write a more optimized version, but didn't want to make the original PR too hard to review / understand.

Optimization is a trade off - it offers better performance, at a cost of readability and complexity, so we normally reserve it for bottleneck areas. The SceneTreeFTI is such a bottleneck area (like rendering or physics), and so some trade offs are made here for performance. Fair warning it is not for the faint of heart.

How it works

Instead of naively traversing the entire scene tree, we only traverse the branches that have changed.

As nodes are moved we record them in frame xform lists for later processing. We ideally want to sort these nodes by depth, as traversing down from the higher nodes (low depth, close to root) will prevent duplication of branch processing from lower nodes.

Instead of sorting with e.g. quicksort, we maintain a fixed number of depth layers, and just place the node in the corresponding depth layer. Then as we process the nodes on the frame, we can do it in depth order with no sorting required. If we blow past the max layers, this is not a problem, it just might do a little more processing.

Further optimizations done in this PR

  • Nodes that are on the tick list are currently being interpolated, but nodes that are not on the tick list are not requiring interpolation (expensive) so we can just directly substitute their current local xform.
  • Concatenating parent xforms is also expensive. We can reduce the cost of this by noting that a lot of nodes have an identity xform. We test at opportune places for identity xform, and mark a flag on the node. If a node is using identity, we don't need to concat, we can directly copy the parent global xform to be the child nodes' global xform.

Debugging / Testing

3 modes are offered via a (temporary?) project setting:

  • Default (optimized)
  • Legacy (naive whole tree traversal)
  • Debug (alternates between the two methods, and prints stats)

Additionally there are new debugging compile defines:

  • GODOT_SCENE_TREE_FTI_PRINT_TREE - prints the nodes processed.
  • GODOT_SCENE_TREE_FTI_VERIFY - kind of like a unit test, it uses both methods, and tests that the optimized result is the same as the naive full tree result.

Testing / verification code

Verifying the results of the optimized path are the same as the reference path is itself rather complex, and is implemented here as a separate file, which will be compiled out in regular builds.

The tests file contains a duplicate of the traversal code. This makes both easier to read and understand, although it does mean the test would need to be kept in sync with any changes to the regular path if it to still do its job.

I did start by keeping both paths in the same function using #ifdefs, but it was becoming unreadable (for me, let alone reviewers), so on balance I have gone for a separate file. Another alternative would be to remove the testing code from the main Godot repo and do this independently, but it is kind of nice that anyone can easily run the testing just by defining GODOT_SCENE_TREE_FTI_VERIFY.

Example debug logging when DEBUG is set in the project setting

FTI reference nodes traversed : 949, processed : 0, took 35 usec (start)
FTI reference nodes traversed : 949, processed : 40, took 33 usec (end)
FTI optimized nodes traversed : 0, processed : 0, took 0 usec (start)
FTI optimized nodes traversed : 44, skipped 2, processed : 40, took 5 usec (end)

9 nodes moved during frame:
	Armature
	VAimer
	VAimer
	VAimer
	HairAttachment
	BoneAttachment
	WeaponPivotPoint
	WeaponPivotPoint
	WeaponPivotPoint

This shows how many nodes were touched, how many processed, and timings.

Additionally, what should be a very useful function, it lists all nodes that were moved during the frame. When using physics interpolation most nodes should be moved during the physics tick, and moving during the frame is normally a user error unless that branch has been switched to physics_interpolation_mode OFF.

This allows users to quickly track down which nodes in their scene might be causing problems with physics interpolation. Particularly useful when converting existing game projects, especially ones you have not authored.

Notes

  • It's just possible there is a better folder for SceneTreeFTITests, but I wasn't sure whether putting it in main/tests was a good idea if that folder uses auto-generation of tests (as this is nothing like the unit tests).

@lawnjelly lawnjelly added this to the 4.5 milestone May 10, 2025
@lawnjelly lawnjelly requested review from a team as code owners May 10, 2025 14:00
@lawnjelly lawnjelly requested a review from rburing May 11, 2025 13:37
@lawnjelly lawnjelly force-pushed the fti_optimize_scene_tree4 branch from 3397f0f to 2392382 Compare May 22, 2025 17:56
@lawnjelly
Copy link
Member Author

Rebased after #106224 .

@lawnjelly lawnjelly requested review from Calinou and clayjohn May 24, 2025 08:24
@lawnjelly
Copy link
Member Author

Just wildly throwing out some more names for review now as @rburing may not be available, and we should really look at getting this merged before feature freeze for 4.5. Anyone welcome to review / test.

Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally on all 3D demos that use physics, it works as expected (at 20 tps). Code looks good to me.

We should probably remove the Legacy codepath before 4.5 is released unless there's a good reason to keep it selectable as an option. The change should be 100% transparent for users anyway (nonwithstanding bugs).

@lawnjelly lawnjelly force-pushed the fti_optimize_scene_tree4 branch from 2392382 to c7764ef Compare May 24, 2025 16:38
@lawnjelly
Copy link
Member Author

We should probably remove the Legacy codepath before 4.5 is released unless there's a good reason to keep it selectable as an option. The change should be 100% transparent for users anyway (nonwithstanding bugs).

Yup we should be able to, it's just there as a temporary hedge for regressions really.

Copy link
Member

@rburing rburing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I couldn't review earlier, I was away due to family circumstances.

This optimization strategy makes sense and the code looks reasonable to me.

Thanks @Calinou for the extra testing.

@Repiteo Repiteo merged commit f984e15 into godotengine:master May 26, 2025
20 checks passed
@Repiteo
Copy link
Contributor

Repiteo commented May 26, 2025

Thanks!

@lawnjelly lawnjelly deleted the fti_optimize_scene_tree4 branch May 26, 2025 17:35
@TranquilMarmot
Copy link

TranquilMarmot commented May 27, 2025

@lawnjelly do you know if this would affect how physics interpolation would interact with a VehicleBody3D?

Building the latest master, I have a VehicleBody3D that does not move properly after this PR was merged. Turning off physics interpolation it works, and checking out the commit before this was merged it works. I also tried setting physics/3d/physics_interpolation/scene_traversal to Legacy / Debug and both of those work, only with DEFAULT is it broken.

It seems like the RigidBody3D that represents the VehicleBody3D is updating its position properly, but the visual components of it are not updating.

@lawnjelly
Copy link
Member Author

lawnjelly commented May 28, 2025

@TranquilMarmot Yes I'm fully expecting one or two regressions that I've missed, that's party why I added the legacy option.

Do feel free to create a separate issue for this with a minimum reproduction project (MRP) rather than tagging onto the PR, as it's easier to manage and other people will see it who might have the same problem. I'll start investigating when I get your MRP. 👍

EDIT: Trucktown as I thought works fine (it was used quite a bit in testing) and that has VehicleBody3D, so it might need something particular to your project to reproduce, will wait for your MRP.

@stevenlgreen00
Copy link

I wasn't sure where to ask, but I'm running into the "scene_tree_depth_limit". Is this value exposed someplace?

I was able to run my scene in Legacy mode, and also when I changed the default from 32 to 64 in C++ it runs correctly.

Without this change the engine crashes at line 314 in "scene_tree_fti.cpp", "dest_list.push_back(s);".

@lawnjelly
Copy link
Member Author

lawnjelly commented Oct 9, 2025

I'll take a look, there could be a bug in the handling of this situation (as it's pretty rare).
Ah yes I think I may have spotted it, there may be an off by one error here:

depth = MIN(depth, (int32_t)data.scene_tree_depth_limit);

I'll see if I can fix this up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Severe FPS drop when physics_interpolation is enabled on scenes with many static bodies

6 participants