-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[libc++] Optimize ranges::{for_each, for_each_n} for segmented iterators #132896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
49011aa
to
ba1d5d4
Compare
@llvm/pr-subscribers-libcxx Author: Peng Liu (winner245) ChangesThis patch extends segmented iterator optimizations, previously applied to Addresses a subtask of #102817.
|
libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each_n.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/libcxx/algorithms/ranges_robust_against_copying_comparators.pass.cpp
Outdated
Show resolved
Hide resolved
ba1d5d4
to
c113266
Compare
libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each_n.pass.cpp
Outdated
Show resolved
Hide resolved
a7041cc
to
a2e451d
Compare
16438be
to
047acfd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch! I left some comments but I think this is going to be a nice optimization.
libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each_n.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each_n.pass.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each_n.pass.cpp
Outdated
Show resolved
Hide resolved
0aad396
to
5a7b6eb
Compare
libcxx/test/benchmarks/algorithms/nonmodifying/for_each_join_view.bench.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/benchmarks/algorithms/nonmodifying/for_each_n.bench.cpp
Outdated
Show resolved
Hide resolved
libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each_n.pass.cpp
Outdated
Show resolved
Hide resolved
198fe3b
to
f5d13ab
Compare
d14bde4
to
8a5bcdc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the scope of this patch is getting a bit out of hand. The title says that you're optimizing ranges::for_each{,_n}
, but you're also back-porting the std::for_each
optimization to C++03, adding and adding an optimization to std::for_each_n
. Could we split this up to make it clear what changes are required for what optimizations? Also, why do we want to back-port the std::for_each
optimization now? Do we think the extra complexity is worth the improved performance?
Thank you for your feedback! I agree that the scope of the patch has expanded beyond its original intent. Initially, the goal was simple: only to extend the optimization for However, I agree that this made the patch diverge from its original purpose and may complicate the review process. Following your suggestion, I will work on splitting it to make it clear what this patch focuses on. -------------- Update --------------
This separation allows the current PR to focus exclusively on the optimization of the ranges algorithms. I will rebase my current patch on the above split pieces once they are landed. |
8a5bcdc
to
5a225dd
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
5a225dd
to
b366e93
Compare
b366e93
to
216b957
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once comments are addressed. Thanks a lot for this series of refactorings / optimizations!
resulting in performance improvements of up to 21.3x for ``std::deque::iterator`` and 24.9x for ``join_view`` of | ||
``vector<vector<char>>``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should report this optimization on the same line as the std::for_each
optimization above -- I don't think there is much to be gained from having nearly-duplicate release notes since these algorithms are very similar. While we aim for a good level of completeness in our release notes, we also want to make them as useful to users as possible.
for (; __first != __last; ++__first) | ||
std::invoke(__func, std::invoke(__proj, *__first)); | ||
return {std::move(__first), std::move(__func)}; | ||
if constexpr (!std::assignable_from<_Iter&, _Sent> && sized_sentinel_for<_Sent, _Iter>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if constexpr (!std::assignable_from<_Iter&, _Sent> && sized_sentinel_for<_Sent, _Iter>) { | |
// In the case where we have different iterator and sentinel types, the segmented iterator optimization | |
// in std::for_each will not kick in. Therefore, we prefer std::for_each_n in that case (whenever we can | |
// obtain the `n`). | |
if constexpr (!std::assignable_from<_Iter&, _Sent> && sized_sentinel_for<_Sent, _Iter>) { |
->Arg(1024) | ||
->Arg(4096) | ||
->Arg(8192) | ||
->Arg(1 << 20); | ||
->Arg(1 << 14) | ||
->Arg(1 << 16) | ||
->Arg(1 << 18); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it would be better to leave the old benchmark values in place. They are less comprehensive but we need to achieve a tradeoff between comprehensiveness and the time it takes to run these benchmarks.
->Arg(8) | ||
->Arg(32) | ||
->Arg(50) // non power-of-two | ||
->Arg(1024) | ||
->Arg(4096) | ||
->Arg(8192) | ||
->Arg(1 << 14) | ||
->Arg(1 << 16) | ||
->Arg(1 << 18); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here for the benchmark sizes.
bm.operator()<std::list<int>>("std::for_each_n(list<int>)", std_for_each_n); | ||
bm.operator()<std::vector<int>>("rng::for_each_n(vector<int>)", std::ranges::for_each_n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the same numbers as for the std::for_each
benchmarks.
Previously, the segmented iterator optimization was limited to
std::{for_each, for_each_n}
. This patch aims to extend the optimization tostd::ranges::for_each
andstd::ranges::for_each_n
, ensuring consistent optimizations across these algorithms. This patch first generalizes thestd
algorithms by introducing aProjection
parameter, which is set to__identity
for thestd
algorithms. Then we let theranges
algorithms to directly call theirstd
counterparts with a general__proj
argument. Benchmarks demonstrate performance improvements of up to 21.3x forstd::deque::iterator
and 24.9x forjoin_view
ofvector<vector<char>>
.Addresses a subtask of #102817.
Summary of speedups for
deque
iteratorsSummary of speedups for
join_view
iteratorsBenchmarks:
{std, ranges}::for_each_n
withdeque
iterators{std, ranges}::for_each
withdeque
iterators{std, ranges}::for_each_{, n}
withjoin_view
iterators