-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Yeet for_each
#4060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Yeet for_each
#4060
Conversation
Was it confirmed by the benchmarks?
|
Reran the full suite between the two. Both the busy_systems and contrived benchmarks seem to be consistently a bit faster or at least not regressing, even when switched to
|
The benchmarks in the Bevy repo don't cover some of the cases that motivated for_each: I re-ran the benchmarks in my ecs_bench_suite fork (I ported to bevy main, but havent pushed those changes yet): Note the stark differences in frag_iter performance. However once we add black_box to the benches (which ecs_bench_suite doesn't currently do as a matter of policy) the gap does close: However its hard to say how often we'd get the "non-black-boxed" perf in the real world. And even if black_box is the "real world perf", there is still a measurable difference for fragmented iterators. |
Some clarifying points:
|
Whether or not we should treat black_box benches as our source of truth is an open debate in my mind. My understanding of black_box is that it is supposed to force the benchmark to behave like a "real world" program by preventing the compiler from "optimizing out" things that it wouldn't in the real world |
This was brought up in Discord, but the compiler will also optimize out any expression where the output is not being used too. https://gendignoux.com/blog/2022/01/31/rust-benchmarks.html. Not sure if the ecs_bench suite change you made also includes this. If the mutable reference was black_box'ed then dereferenced, the compiler has to assume that there's some alteration to the referenced value, instead of a typical
I agree, |
Is this still a goal? |
@CGMossa this is indeed still a goal and contingent on improving the base iterator's perf across the board until it matches for_each. That said quite a few PRs have been merged since this was first opened. As of #6461, on my machine, the two are identical in all but fragmented iteration. Of which, wider queries are identical in perf, and the smaller queries differ by less than 100ns. It might be due time to revisit this and consider if such a perf benefit is worth the maintenance cost of keeping these separate and the mental burden of choosing one over the other for end users. |
Closing this as code has diverged a lot and a new set of benches would be required to justify the change. Definitely open to the idea if the numbers are compelling! |
Going to leave it as is, since #6547 introduces a much wider performance gap between for_each and iter due to the former more easily autovectorizing. Will come back to this when we can make iter do the same. |
Objective
Fixes #4059. Remove
for_each/for_each_mut
from both Query and QueryState as it no longer is notably faster than the correspondingiter/iter_mut
functions.Solution
Yeet it!
Migration Guide
Query/QueryState::for_each(_mut)
has been removed as it is no longer notably faster thaniter/iter_mut
. Use theiter
implementation instead.