-
-
Notifications
You must be signed in to change notification settings - Fork 23.4k
Faster Node::get_child_count()
#106226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster Node::get_child_count()
#106226
Conversation
3463b63
to
1a71931
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a great idea.
I'm not sure if we have to move it to the header though. I don't think the thread guards can be optimized out, and the if
check should be fairly fast anyway (since it's predictable after the first call). And having all those generically named defines in the header suddenly sounds risky!
Yeah I was in two minds .. however if you look at the The defines are in the header either way I think, they were previously at the bottom of the header (rather than in the cpp). 🤔 Alternatively, we could say that most calling sites that are bottlenecks could move over to using I do recognise there's probably a few accessors that are using these thread guards from the cpp that could be moved to the header in a similar manner. I'm not sure the history of where they were originally defined. If the idea of having the thread guards only in cpp is not to pollute the global namespace, we could move the defines to a separate file and include it from What having the thread guards in the cpp does maybe do is discourage inlining (maybe to save on bloat or make error messages better? 🤔 ) although there's no guarantee with link time optimization I guess. I'm fine to move it though if we'd rather not have in the header, as we should have other options available after #106224 . |
There is no need to update the children cache when including internal nodes in the count.
1a71931
to
795ed8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Doesn't it make sense to put a note in the documentation that the function is significantly faster if |
That's an implementation detail, and I'm not sure it helps anyone, as you either need the children including internal or you don't, it's kind of the tail wagging the dog. In both cases it is probably advisable to get ahead of time into a local variable, but that's kind of obvious when profiling, and not what we typically would try to document. If you meant comment in the source code though, sometimes it's worth a comment, but I think we have to establish this a bit more in practice before making recommendations. This is relevant mainly for c++ BTW. If you meant from e.g. binding for gdscript, it's less relevant as the binding will be so slow anyway it will dwarf these kinds of micro-optimizations. |
Thanks! |
There is no need to update the children cache when including internal nodes in the count, we can grab the size directly from the
HashMap
.Notes
get_child()
ensures the cache is updated anyway, andget_child_count()
may be used in tight loops. (The optimizer likely can't optimize this out, because in the case of zero children,get_child()
will never be called.)get_child_count()
into a local variable prior to a loop (unless you have some race condition where the children are being altered from another thread), but this seems a free win, and effectively should optimize down to a singlegetter
whenp_include_internal
is true.