Skip to content

Conversation

lawnjelly
Copy link
Member

@lawnjelly lawnjelly commented May 10, 2025

There is no need to update the children cache when including internal nodes in the count, we can grab the size directly from the HashMap.

Notes

  • This should be an easy win, as calling get_child() ensures the cache is updated anyway, and get_child_count() may be used in tight loops. (The optimizer likely can't optimize this out, because in the case of zero children, get_child() will never be called.)
  • It is possible (but seemingly unlikely) that some logic would require this function to update the cache, but this should show quickly on testing.
  • It is generally far better to store get_child_count() into a local variable prior to a loop (unless you have some race condition where the children are being altered from another thread), but this seems a free win, and effectively should optimize down to a single getter when p_include_internal is true.

Copy link
Member

@Ivorforce Ivorforce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a great idea.

I'm not sure if we have to move it to the header though. I don't think the thread guards can be optimized out, and the if check should be fairly fast anyway (since it's predictable after the first call). And having all those generically named defines in the header suddenly sounds risky!

@lawnjelly
Copy link
Member Author

lawnjelly commented May 10, 2025

Yeah I was in two minds .. however if you look at the #defines for the thread guards they are noops without DEBUG_ENABLED, so the code will reduce to simply returning size().

The defines are in the header either way I think, they were previously at the bottom of the header (rather than in the cpp). 🤔

Alternatively, we could say that most calling sites that are bottlenecks could move over to using get_cached_children().size(), or storing the count in a local variable, so such optimization here may no longer be necessary. 🤔

I do recognise there's probably a few accessors that are using these thread guards from the cpp that could be moved to the header in a similar manner. I'm not sure the history of where they were originally defined.

If the idea of having the thread guards only in cpp is not to pollute the global namespace, we could move the defines to a separate file and include it from node.cpp and node_3d.cpp and a few other places where they are used.

What having the thread guards in the cpp does maybe do is discourage inlining (maybe to save on bloat or make error messages better? 🤔 ) although there's no guarantee with link time optimization I guess.

I'm fine to move it though if we'd rather not have in the header, as we should have other options available after #106224 .

There is no need to update the children cache when including internal nodes in the count.
@lawnjelly lawnjelly force-pushed the faster_get_child_count branch from 1a71931 to 795ed8f Compare May 10, 2025 10:36
Copy link
Member

@Ivorforce Ivorforce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@Jordyfel
Copy link
Contributor

Doesn't it make sense to put a note in the documentation that the function is significantly faster if include_internal is true?

@lawnjelly
Copy link
Member Author

Doesn't it make sense to put a note in the documentation that the function is significantly faster if include_internal is true?

That's an implementation detail, and I'm not sure it helps anyone, as you either need the children including internal or you don't, it's kind of the tail wagging the dog.

In both cases it is probably advisable to get ahead of time into a local variable, but that's kind of obvious when profiling, and not what we typically would try to document. If you meant comment in the source code though, sometimes it's worth a comment, but I think we have to establish this a bit more in practice before making recommendations.

This is relevant mainly for c++ BTW. If you meant from e.g. binding for gdscript, it's less relevant as the binding will be so slow anyway it will dwarf these kinds of micro-optimizations.

@akien-mga akien-mga merged commit a377f50 into godotengine:master May 12, 2025
20 checks passed
@akien-mga
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants