[libc++][ranges] Optimize the performance of `ranges::starts_with` #84570

xiaoyang-sde · 2024-03-08T21:48:13Z

Abstract

This pull request attempts to optimize the performance of the ranges::starts_with algorithm.

template<input_iterator I1, sentinel_for<I1> S1, input_iterator I2, sentinel_for<I2> S2,
         class Pred = ranges::equal_to, class Proj1 = identity, class Proj2 = identity>
  requires indirectly_comparable<I1, I2, Pred, Proj1, Proj2>
constexpr bool ranges::starts_with(I1 first1, S1 last1, I2 first2, S2 last2, Pred pred = {},
                                   Proj1 proj1 = {}, Proj2 proj2 = {});
template<input_range R1, input_range R2, class Pred = ranges::equal_to, class Proj1 = identity,
         class Proj2 = identity>
  requires indirectly_comparable<iterator_t<R1>, iterator_t<R2>, Pred, Proj1, Proj2>
constexpr bool ranges::starts_with(R1&& r1, R2&& r2, Pred pred = {},
                                   Proj1 proj1 = {}, Proj2 proj2 = {});

Implementation

The ranges::starts_with algorithm will check if the first range has fewer elements than the second range before accessing the elements, if both I1, S1 and I2, S2 model sized_sentinel_for.
The ranges::starts_with algorithm will invoke ranges::equal if both I1, S1 and I2, S2 model sized_sentinel_for and random_access_range. ranges::equal will use __builtin_memcmp for element comparison under specific conditions.
The __desugars_to template has more specialization, handling scenarios when the predicate is wrapped in a std::reference_wrapper. (In the existing implementation, ranges::ends_with forwarding to ranges:equal, but it wraps the predicate with std::ref, thus failing to meet the constraint of the optimized implementation of __equal_impl. This change fixes this issue.)

Benchmarking

The result is obtained from Arch Linux running on WSL 2, with an i5-12400 processor and 32 GB of RAM. The result indicates that the memcmp optimization provides significant performance improvements.

2024-03-08T23:40:18-08:00
Running ./ranges_starts_with.libcxx.out
Run on (12 X 2496 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 1280 KiB (x6)
  L3 Unified 18432 KiB (x1)
Load Average: 0.79, 1.93, 2.27
-----------------------------------------------------------------------------------------------------------
Benchmark                                                                 Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------
bm_starts_with_contiguous_iter_with_memcmp_optimization/16             1.50 ns         1.50 ns    460164239
bm_starts_with_contiguous_iter_with_memcmp_optimization/256            12.5 ns         12.5 ns     55148246
bm_starts_with_contiguous_iter_with_memcmp_optimization/4096            189 ns          189 ns      4030953
bm_starts_with_contiguous_iter_with_memcmp_optimization/65536          4889 ns         4889 ns       157791
bm_starts_with_contiguous_iter_with_memcmp_optimization/1048576      138974 ns       138964 ns         5810
bm_starts_with_contiguous_iter_with_memcmp_optimization/16777216    3874632 ns      3874633 ns          180
bm_starts_with_contiguous_iter/16                                      6.55 ns         6.55 ns     90161221
bm_starts_with_contiguous_iter/256                                     99.8 ns         99.8 ns      7093729
bm_starts_with_contiguous_iter/4096                                    1505 ns         1488 ns       420866
bm_starts_with_contiguous_iter/65536                                  24836 ns        24836 ns        29657
bm_starts_with_contiguous_iter/1048576                               400852 ns       400847 ns         1700
bm_starts_with_contiguous_iter/16777216                             7017180 ns      7017118 ns          101
bm_starts_with_random_access_iter/16                                   6.86 ns         6.86 ns     99695785
bm_starts_with_random_access_iter/256                                   102 ns          102 ns      6721592
bm_starts_with_random_access_iter/4096                                 1537 ns         1537 ns       478604
bm_starts_with_random_access_iter/65536                               23564 ns        23564 ns        30089
bm_starts_with_random_access_iter/1048576                            382977 ns       382985 ns         1850
bm_starts_with_random_access_iter/16777216                          6952078 ns      6952032 ns           99
bm_starts_with_bidirectional_iter/16                                   6.23 ns         6.23 ns    111062808
bm_starts_with_bidirectional_iter/256                                   103 ns          103 ns      6981875
bm_starts_with_bidirectional_iter/4096                                 1498 ns         1468 ns       451407
bm_starts_with_bidirectional_iter/65536                               23994 ns        23994 ns        30011
bm_starts_with_bidirectional_iter/1048576                            381783 ns       381781 ns         1838
bm_starts_with_bidirectional_iter/16777216                          7538205 ns      7538213 ns          103
bm_starts_with_forward_iter/16                                         6.37 ns         6.37 ns    103694333
bm_starts_with_forward_iter/256                                         103 ns          103 ns      7017987
bm_starts_with_forward_iter/4096                                       1535 ns         1535 ns       483272
bm_starts_with_forward_iter/65536                                     23208 ns        23208 ns        28449
bm_starts_with_forward_iter/1048576                                  370772 ns       370773 ns         1812
bm_starts_with_forward_iter/16777216                                6758327 ns      6758279 ns          101

Reference

github-actions · 2024-03-08T21:48:31Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2024-03-09T08:15:04Z

@llvm/pr-subscribers-libcxx

Author: Xiaoyang Liu (xiaoyang-sde)

Changes

Abstract

This pull request attempts to optimize the performance of ranges::starts_with.

This pull request is related to the issue #83880.

Benchmarking

2024-03-08T23:40:18-08:00
Running ./ranges_starts_with.libcxx.out
Run on (12 X 2496 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 1280 KiB (x6)
  L3 Unified 18432 KiB (x1)
Load Average: 0.79, 1.93, 2.27
-----------------------------------------------------------------------------------------------------------
Benchmark                                                                 Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------
bm_starts_with_contiguous_iter_with_memcmp_optimization/16             1.50 ns         1.50 ns    460164239
bm_starts_with_contiguous_iter_with_memcmp_optimization/256            12.5 ns         12.5 ns     55148246
bm_starts_with_contiguous_iter_with_memcmp_optimization/4096            189 ns          189 ns      4030953
bm_starts_with_contiguous_iter_with_memcmp_optimization/65536          4889 ns         4889 ns       157791
bm_starts_with_contiguous_iter_with_memcmp_optimization/1048576      138974 ns       138964 ns         5810
bm_starts_with_contiguous_iter_with_memcmp_optimization/16777216    3874632 ns      3874633 ns          180
bm_starts_with_contiguous_iter/16                                      6.55 ns         6.55 ns     90161221
bm_starts_with_contiguous_iter/256                                     99.8 ns         99.8 ns      7093729
bm_starts_with_contiguous_iter/4096                                    1505 ns         1488 ns       420866
bm_starts_with_contiguous_iter/65536                                  24836 ns        24836 ns        29657
bm_starts_with_contiguous_iter/1048576                               400852 ns       400847 ns         1700
bm_starts_with_contiguous_iter/16777216                             7017180 ns      7017118 ns          101
bm_starts_with_random_access_iter/16                                   6.86 ns         6.86 ns     99695785
bm_starts_with_random_access_iter/256                                   102 ns          102 ns      6721592
bm_starts_with_random_access_iter/4096                                 1537 ns         1537 ns       478604
bm_starts_with_random_access_iter/65536                               23564 ns        23564 ns        30089
bm_starts_with_random_access_iter/1048576                            382977 ns       382985 ns         1850
bm_starts_with_random_access_iter/16777216                          6952078 ns      6952032 ns           99
bm_starts_with_bidirectional_iter/16                                   6.23 ns         6.23 ns    111062808
bm_starts_with_bidirectional_iter/256                                   103 ns          103 ns      6981875
bm_starts_with_bidirectional_iter/4096                                 1498 ns         1468 ns       451407
bm_starts_with_bidirectional_iter/65536                               23994 ns        23994 ns        30011
bm_starts_with_bidirectional_iter/1048576                            381783 ns       381781 ns         1838
bm_starts_with_bidirectional_iter/16777216                          7538205 ns      7538213 ns          103
bm_starts_with_forward_iter/16                                         6.37 ns         6.37 ns    103694333
bm_starts_with_forward_iter/256                                         103 ns          103 ns      7017987
bm_starts_with_forward_iter/4096                                       1535 ns         1535 ns       483272
bm_starts_with_forward_iter/65536                                     23208 ns        23208 ns        28449
bm_starts_with_forward_iter/1048576                                  370772 ns       370773 ns         1812
bm_starts_with_forward_iter/16777216                                6758327 ns      6758279 ns          101

Full diff: https://github.com/llvm/llvm-project/pull/84570.diff

5 Files Affected:

(modified) libcxx/benchmarks/CMakeLists.txt (+1)
(added) libcxx/benchmarks/algorithms/ranges_starts_with.bench.cpp (+107)
(modified) libcxx/include/__algorithm/ranges_starts_with.h (+59-23)
(modified) libcxx/include/__functional/operations.h (+9)
(modified) libcxx/include/__functional/ranges_operations.h (+5)

diff --git a/libcxx/benchmarks/CMakeLists.txt b/libcxx/benchmarks/CMakeLists.txt
index b436e96f178b70..621b295a92c8d2 100644
--- a/libcxx/benchmarks/CMakeLists.txt
+++ b/libcxx/benchmarks/CMakeLists.txt
@@ -186,6 +186,7 @@ set(BENCHMARK_TESTS
     algorithms/pstl.stable_sort.bench.cpp
     algorithms/push_heap.bench.cpp
     algorithms/ranges_contains.bench.cpp
+    algorithms/ranges_starts_with.bench.cpp
     algorithms/ranges_ends_with.bench.cpp
     algorithms/ranges_make_heap.bench.cpp
     algorithms/ranges_make_heap_then_sort_heap.bench.cpp
diff --git a/libcxx/benchmarks/algorithms/ranges_starts_with.bench.cpp b/libcxx/benchmarks/algorithms/ranges_starts_with.bench.cpp
new file mode 100644
index 00000000000000..9986fdb1684fc7
--- /dev/null
+++ b/libcxx/benchmarks/algorithms/ranges_starts_with.bench.cpp
@@ -0,0 +1,107 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include <algorithm>
+#include <benchmark/benchmark.h>
+#include <functional>
+#include <vector>
+
+#include "test_iterators.h"
+
+static void bm_starts_with_contiguous_iter_with_memcmp_optimization(benchmark::State& state) {
+  std::vector<int> a(state.range(), 1);
+  std::vector<int> p(state.range(), 1);
+
+  for (auto _ : state) {
+    benchmark::DoNotOptimize(a);
+    benchmark::DoNotOptimize(p);
+
+    auto begin1 = contiguous_iterator(a.data());
+    auto end1   = contiguous_iterator(a.data() + a.size());
+    auto begin2 = contiguous_iterator(p.data());
+    auto end2   = contiguous_iterator(p.data() + p.size());
+
+    benchmark::DoNotOptimize(std::ranges::starts_with(begin1, end1, begin2, end2));
+  }
+}
+BENCHMARK(bm_starts_with_contiguous_iter_with_memcmp_optimization)->RangeMultiplier(16)->Range(16, 16 << 20);
+
+static void bm_starts_with_contiguous_iter(benchmark::State& state) {
+  std::vector<int> a(state.range(), 1);
+  std::vector<int> p(state.range(), 1);
+
+  for (auto _ : state) {
+    benchmark::DoNotOptimize(a);
+    benchmark::DoNotOptimize(p);
+
+    auto begin1 = contiguous_iterator(a.data());
+    auto end1   = contiguous_iterator(a.data() + a.size());
+    auto begin2 = contiguous_iterator(p.data());
+    auto end2   = contiguous_iterator(p.data() + p.size());
+
+    benchmark::DoNotOptimize(
+        std::ranges::starts_with(begin1, end1, begin2, end2, [](const int a, const int b) { return a == b; }));
+  }
+}
+BENCHMARK(bm_starts_with_contiguous_iter)->RangeMultiplier(16)->Range(16, 16 << 20);
+
+static void bm_starts_with_random_access_iter(benchmark::State& state) {
+  std::vector<int> a(state.range(), 1);
+  std::vector<int> p(state.range(), 1);
+
+  for (auto _ : state) {
+    benchmark::DoNotOptimize(a);
+    benchmark::DoNotOptimize(p);
+
+    auto begin1 = random_access_iterator(a.data());
+    auto end1   = random_access_iterator(a.data() + a.size());
+    auto begin2 = random_access_iterator(p.data());
+    auto end2   = random_access_iterator(p.data() + p.size());
+
+    benchmark::DoNotOptimize(std::ranges::starts_with(begin1, end1, begin2, end2));
+  }
+}
+BENCHMARK(bm_starts_with_random_access_iter)->RangeMultiplier(16)->Range(16, 16 << 20);
+
+static void bm_starts_with_bidirectional_iter(benchmark::State& state) {
+  std::vector<int> a(state.range(), 1);
+  std::vector<int> p(state.range(), 1);
+
+  for (auto _ : state) {
+    benchmark::DoNotOptimize(a);
+    benchmark::DoNotOptimize(p);
+
+    auto begin1 = bidirectional_iterator(a.data());
+    auto end1   = bidirectional_iterator(a.data() + a.size());
+    auto begin2 = bidirectional_iterator(p.data());
+    auto end2   = bidirectional_iterator(p.data() + p.size());
+
+    benchmark::DoNotOptimize(std::ranges::starts_with(begin1, end1, begin2, end2));
+  }
+}
+BENCHMARK(bm_starts_with_bidirectional_iter)->RangeMultiplier(16)->Range(16, 16 << 20);
+
+static void bm_starts_with_forward_iter(benchmark::State& state) {
+  std::vector<int> a(state.range(), 1);
+  std::vector<int> p(state.range(), 1);
+
+  for (auto _ : state) {
+    benchmark::DoNotOptimize(a);
+    benchmark::DoNotOptimize(p);
+
+    auto begin1 = forward_iterator(a.data());
+    auto end1   = forward_iterator(a.data() + a.size());
+    auto begin2 = forward_iterator(p.data());
+    auto end2   = forward_iterator(p.data() + p.size());
+
+    benchmark::DoNotOptimize(std::ranges::starts_with(begin1, end1, begin2, end2));
+  }
+}
+BENCHMARK(bm_starts_with_forward_iter)->RangeMultiplier(16)->Range(16, 16 << 20);
+
+BENCHMARK_MAIN();
diff --git a/libcxx/include/__algorithm/ranges_starts_with.h b/libcxx/include/__algorithm/ranges_starts_with.h
index 90e184aa9bccc2..ebcc15555f5544 100644
--- a/libcxx/include/__algorithm/ranges_starts_with.h
+++ b/libcxx/include/__algorithm/ranges_starts_with.h
@@ -9,12 +9,14 @@
 #ifndef _LIBCPP___ALGORITHM_RANGES_STARTS_WITH_H
 #define _LIBCPP___ALGORITHM_RANGES_STARTS_WITH_H
 
-#include <__algorithm/in_in_result.h>
+#include <__algorithm/ranges_equal.h>
 #include <__algorithm/ranges_mismatch.h>
 #include <__config>
 #include <__functional/identity.h>
 #include <__functional/ranges_operations.h>
+#include <__functional/reference_wrapper.h>
 #include <__iterator/concepts.h>
+#include <__iterator/distance.h>
 #include <__iterator/indirectly_comparable.h>
 #include <__ranges/access.h>
 #include <__ranges/concepts.h>
@@ -34,6 +36,48 @@ _LIBCPP_BEGIN_NAMESPACE_STD
 namespace ranges {
 namespace __starts_with {
 struct __fn {
+  template <class _Iter1, class _Sent1, class _Iter2, class _Sent2, class _Pred, class _Proj1, class _Proj2>
+  _LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI static constexpr bool __starts_with_impl(
+      _Iter1 __first1,
+      _Sent1 __last1,
+      _Iter2 __first2,
+      _Sent2 __last2,
+      _Pred& __pred,
+      _Proj1& __proj1,
+      _Proj2& __proj2) {
+    if constexpr (sized_sentinel_for<_Sent1, _Iter1> && sized_sentinel_for<_Sent2, _Iter2>) {
+      auto __n1 = ranges::distance(__first1, __last1);
+      auto __n2 = ranges::distance(__first2, __last2);
+      if (__n2 == 0) {
+        return true;
+      }
+      if (__n2 > __n1) {
+        return false;
+      }
+
+      if constexpr (random_access_iterator<_Iter1> && random_access_iterator<_Iter2>) {
+        return ranges::equal(
+            std::move(__first1),
+            ranges::next(__first1, __n2),
+            std::move(__first2),
+            std::move(__last2),
+            std::ref(__pred),
+            std::ref(__proj1),
+            std::ref(__proj2));
+      }
+    }
+
+    return ranges::mismatch(
+               std::move(__first1),
+               std::move(__last1),
+               std::move(__first2),
+               std::move(__last2),
+               std::ref(__pred),
+               std::ref(__proj1),
+               std::ref(__proj2))
+               .in2 == __last2;
+  }
+
   template <input_iterator _Iter1,
             sentinel_for<_Iter1> _Sent1,
             input_iterator _Iter2,
@@ -42,23 +86,16 @@ struct __fn {
             class _Proj1 = identity,
             class _Proj2 = identity>
     requires indirectly_comparable<_Iter1, _Iter2, _Pred, _Proj1, _Proj2>
-  _LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI constexpr bool operator()(
+  _LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI static constexpr bool operator()(
       _Iter1 __first1,
       _Sent1 __last1,
       _Iter2 __first2,
       _Sent2 __last2,
       _Pred __pred   = {},
       _Proj1 __proj1 = {},
-      _Proj2 __proj2 = {}) const {
-    return __mismatch::__fn::__go(
-               std::move(__first1),
-               std::move(__last1),
-               std::move(__first2),
-               std::move(__last2),
-               __pred,
-               __proj1,
-               __proj2)
-               .in2 == __last2;
+      _Proj2 __proj2 = {}) {
+    return __starts_with_impl(
+        std::move(__first1), std::move(__last1), std::move(__first2), std::move(__last2), __pred, __proj1, __proj2);
   }
 
   template <input_range _Range1,
@@ -67,17 +104,16 @@ struct __fn {
             class _Proj1 = identity,
             class _Proj2 = identity>
     requires indirectly_comparable<iterator_t<_Range1>, iterator_t<_Range2>, _Pred, _Proj1, _Proj2>
-  _LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI constexpr bool operator()(
-      _Range1&& __range1, _Range2&& __range2, _Pred __pred = {}, _Proj1 __proj1 = {}, _Proj2 __proj2 = {}) const {
-    return __mismatch::__fn::__go(
-               ranges::begin(__range1),
-               ranges::end(__range1),
-               ranges::begin(__range2),
-               ranges::end(__range2),
-               __pred,
-               __proj1,
-               __proj2)
-               .in2 == ranges::end(__range2);
+  _LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI static constexpr bool
+  operator()(_Range1&& __range1, _Range2&& __range2, _Pred __pred = {}, _Proj1 __proj1 = {}, _Proj2 __proj2 = {}) {
+    return __starts_with_impl(
+        ranges::begin(__range1),
+        ranges::end(__range1),
+        ranges::begin(__range2),
+        ranges::end(__range2),
+        __pred,
+        __proj1,
+        __proj2);
   }
 };
 } // namespace __starts_with
diff --git a/libcxx/include/__functional/operations.h b/libcxx/include/__functional/operations.h
index 7ddc00650f162f..4565e1415a6e02 100644
--- a/libcxx/include/__functional/operations.h
+++ b/libcxx/include/__functional/operations.h
@@ -13,6 +13,7 @@
 #include <__config>
 #include <__functional/binary_function.h>
 #include <__functional/unary_function.h>
+#include <__fwd/functional.h>
 #include <__type_traits/integral_constant.h>
 #include <__type_traits/operation_traits.h>
 #include <__utility/forward.h>
@@ -316,10 +317,18 @@ struct _LIBCPP_TEMPLATE_VIS equal_to<void> {
 // comparison when we don't perform an implicit conversion when calling it.
 template <class _Tp>
 struct __desugars_to<__equal_tag, equal_to<_Tp>, _Tp, _Tp> : true_type {};
+template <class _Tp>
+struct __desugars_to<__equal_tag, reference_wrapper<equal_to<_Tp>>, _Tp, _Tp> : true_type {};
+template <class _Tp>
+struct __desugars_to<__equal_tag, reference_wrapper<const equal_to<_Tp>>, _Tp, _Tp> : true_type {};
 
 // In the transparent case, we do not enforce that
 template <class _Tp, class _Up>
 struct __desugars_to<__equal_tag, equal_to<void>, _Tp, _Up> : true_type {};
+template <class _Tp, class _Up>
+struct __desugars_to<__equal_tag, reference_wrapper<equal_to<void>>, _Tp, _Up> : true_type {};
+template <class _Tp, class _Up>
+struct __desugars_to<__equal_tag, reference_wrapper<const equal_to<void>>, _Tp, _Up> : true_type {};
 
 #if _LIBCPP_STD_VER >= 14
 template <class _Tp = void>
diff --git a/libcxx/include/__functional/ranges_operations.h b/libcxx/include/__functional/ranges_operations.h
index 38b28018049eb7..be400c42a65790 100644
--- a/libcxx/include/__functional/ranges_operations.h
+++ b/libcxx/include/__functional/ranges_operations.h
@@ -13,6 +13,7 @@
 #include <__concepts/equality_comparable.h>
 #include <__concepts/totally_ordered.h>
 #include <__config>
+#include <__fwd/functional.h>
 #include <__type_traits/integral_constant.h>
 #include <__type_traits/operation_traits.h>
 #include <__utility/forward.h>
@@ -99,6 +100,10 @@ struct greater_equal {
 // operator are of the same type
 template <class _Tp, class _Up>
 struct __desugars_to<__equal_tag, ranges::equal_to, _Tp, _Up> : true_type {};
+template <class _Tp, class _Up>
+struct __desugars_to<__equal_tag, reference_wrapper<ranges::equal_to>, _Tp, _Up> : true_type {};
+template <class _Tp, class _Up>
+struct __desugars_to<__equal_tag, reference_wrapper<const ranges::equal_to>, _Tp, _Up> : true_type {};
 
 #endif // _LIBCPP_STD_VER >= 20

xiaoyang-sde · 2024-03-09T19:57:51Z

libcxx/include/__functional/ranges_operations.h

+template <class _Tp, class _Up>
+struct __desugars_to<__equal_tag, reference_wrapper<ranges::equal_to>, _Tp, _Up> : true_type {};
+template <class _Tp, class _Up>
+struct __desugars_to<__equal_tag, reference_wrapper<const ranges::equal_to>, _Tp, _Up> : true_type {};


The memcmp-optimized overload of __equal_impl requires that the predicate models __desugars_to<__equal_tag> and the projection function models __is_identity.

template <class _Tp, class _Up, class _Pred, class _Proj1, class _Proj2, __enable_if_t<__desugars_to<__equal_tag, _Pred, _Tp, _Up>::value && __is_identity<_Proj1>::value && __is_identity<_Proj2>::value && !is_volatile<_Tp>::value && !is_volatile<_Up>::value && __libcpp_is_trivially_equality_comparable<_Tp, _Up>::value, int> = 0> _LIBCPP_NODISCARD inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 bool __equal_impl(_Tp* __first1, _Tp* __last1, _Up* __first2, _Up*, _Pred&, _Proj1&, _Proj2&) { return std::__constexpr_memcmp_equal(__first1, __first2, __element_count(__last1 - __first1)); }

Without these changes, this overload won't be selected when the predicate is wrapped in std::reference_wrapper.

I think we should move this into it's own PR, assuming that we want it.

philnik777

With #73255 I don't think there will be any significant performance improvement anymore, so I'm not sure whether we want to continue pursuing this. It could probably reduce the amount of generated code a bit, but I'm not sure it's worth the additional amount of complexity. @ldionne do you have any thoughts?

libcxx/include/__algorithm/ranges_starts_with.h

philnik777 · 2024-03-18T08:57:41Z

libcxx/include/__functional/ranges_operations.h

+template <class _Tp, class _Up>
+struct __desugars_to<__equal_tag, reference_wrapper<ranges::equal_to>, _Tp, _Up> : true_type {};
+template <class _Tp, class _Up>
+struct __desugars_to<__equal_tag, reference_wrapper<const ranges::equal_to>, _Tp, _Up> : true_type {};


I think we should move this into it's own PR, assuming that we want it.

var-const · 2024-04-02T00:39:03Z

With #73255 I don't think there will be any significant performance improvement anymore, so I'm not sure whether we want to continue pursuing this. It could probably reduce the amount of generated code a bit, but I'm not sure it's worth the additional amount of complexity. @ldionne do you have any thoughts?

@xiaoyang-sde Can you please update your branch to make sure #73255 is included and rerun the benchmarks? This might give us some indication on whether this patch is worth pursuing further.

FWIW, I'm somewhat in favor of landing this. If numbers show that there is no performance difference anymore, we should probably remove similar optimizations from ends_with -- whether we delegate to equal or not, IMO it should be consistent between starts_with and ends_with. Also, I think adding a benchmark and extending desugars_to is valuable independent of changes to the algorithm itself.

libcxx/benchmarks/algorithms/ranges_starts_with.bench.cpp

xiaoyang-sde · 2024-04-08T08:00:51Z

Here's the benchmark result of the unoptimized ranges::starts_with (copied from the main branch) with vectorized mismatch. Please note that bm_starts_with_contiguous_iter won't invoke the vectorized mismatch because it uses a custom predicate rather than ranges::equal_to. It seems that forwarding to equal still provides some performance improvements.

2024-04-08T00:51:04-07:00
Running ./ranges_starts_with.libcxx.out
Run on (12 X 2496 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 1280 KiB (x6)
  L3 Unified 18432 KiB (x1)
Load Average: 1.27, 2.29, 1.34
-----------------------------------------------------------------------------------------------------------
Benchmark                                                                 Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------
bm_starts_with_contiguous_iter_with_memcmp_optimization/16             3.09 ns         3.09 ns    212067553
bm_starts_with_contiguous_iter_with_memcmp_optimization/256            20.9 ns         20.9 ns     30603943
bm_starts_with_contiguous_iter_with_memcmp_optimization/4096            310 ns          310 ns      2176699
bm_starts_with_contiguous_iter_with_memcmp_optimization/65536          5394 ns         5394 ns       136959
bm_starts_with_contiguous_iter_with_memcmp_optimization/1048576      134816 ns       134817 ns         4916
bm_starts_with_contiguous_iter_with_memcmp_optimization/16777216    4792548 ns      4792472 ns          150
bm_starts_with_contiguous_iter/16                                      5.38 ns         5.38 ns    123790610
bm_starts_with_contiguous_iter/256                                     81.4 ns         81.4 ns      8541221
bm_starts_with_contiguous_iter/4096                                    1107 ns         1107 ns       615306
bm_starts_with_contiguous_iter/65536                                  16869 ns        16868 ns        41362
bm_starts_with_contiguous_iter/1048576                               293850 ns       293845 ns         2276
bm_starts_with_contiguous_iter/16777216                             6079325 ns      6079284 ns          108
bm_starts_with_random_access_iter/16                                   7.17 ns         7.04 ns    101903410
bm_starts_with_random_access_iter/256                                   100 ns          100 ns      6806306
bm_starts_with_random_access_iter/4096                                 1449 ns         1449 ns       482025
bm_starts_with_random_access_iter/65536                               23967 ns        23967 ns        27789
bm_starts_with_random_access_iter/1048576                            387731 ns       387727 ns         1752
bm_starts_with_random_access_iter/16777216                          6773716 ns      6773727 ns          100
bm_starts_with_bidirectional_iter/16                                   6.95 ns         6.95 ns     97150439
bm_starts_with_bidirectional_iter/256                                   103 ns          101 ns      6666863
bm_starts_with_bidirectional_iter/4096                                 1459 ns         1459 ns       477527
bm_starts_with_bidirectional_iter/65536                               23458 ns        23458 ns        29772
bm_starts_with_bidirectional_iter/1048576                            405195 ns       405189 ns         1822
bm_starts_with_bidirectional_iter/16777216                          6768824 ns      6768460 ns           99
bm_starts_with_forward_iter/16                                         6.86 ns         6.76 ns     99222099
bm_starts_with_forward_iter/256                                        99.3 ns         99.3 ns      6935465
bm_starts_with_forward_iter/4096                                       1455 ns         1455 ns       482635
bm_starts_with_forward_iter/65536                                     23077 ns        23077 ns        29802
bm_starts_with_forward_iter/1048576                                  379438 ns       379432 ns         1795
bm_starts_with_forward_iter/16777216                                6722769 ns      6722515 ns          104

To run the benchmark with the optimized ranges::starts_with, please make sure to add these lines to libcxx/include/__functional/ranges_operations.h:

template <class _Tp, class _Up>
inline const bool __desugars_to_v<__equal_tag, reference_wrapper<ranges::equal_to>, _Tp, _Up> = true;

template <class _Tp, class _Up>
inline const bool __desugars_to_v<__equal_tag, reference_wrapper<const ranges::equal_to>, _Tp, _Up> = true;

ldionne

With #73255 I don't think there will be any significant performance improvement anymore, so I'm not sure whether we want to continue pursuing this. It could probably reduce the amount of generated code a bit, but I'm not sure it's worth the additional amount of complexity. @ldionne do you have any thoughts?

I am not sure I read the benchmarks right, but if I did it looks like we do have quite a bit of a performance benefit with this patch. It makes sense too, since we only need to use memcmp to compare the two ranges instead of finding where they actually differ, which is more complicated.

So IMO this makes sense.

libcxx/include/__algorithm/ranges_starts_with.h

ldionne

I think this LGTM, I don't see any problem with the patch and it does seem to provide a fair speedup. @philnik777 would you agree?

philnik777 · 2024-04-26T17:09:00Z

IIRC these aren't at all the numbers I got when benchmarking std::mismatch against memcmp, so I'd really like to check this again before this lands. I also don't see how there'd be a significant difference, since the loop part in both memcmp and std::mismtach is finding a mismatch. The part that is more work in std::mismatch is always only at the end of the loop, so I'd expect at most a constant time difference between the implementations (which is exactly what I observed IIRC).

xiaoyang-sde · 2024-04-26T21:32:55Z

I wrote a simple benchmark that compares the performance of std::ranges::equal (memcmp) and std::ranges::mismatch (vectorized). I ran it on 2 machines and observed different results.

#include <algorithm>
#include <benchmark/benchmark.h>
#include <vector>

#include "test_iterators.h"

static void bm_starts_with_contiguous_iter_with_equal_impl(benchmark::State& state) {
  std::vector<int> a(state.range(), 1);
  std::vector<int> p(state.range(), 1);

  for (auto _ : state) {
    benchmark::DoNotOptimize(a);
    benchmark::DoNotOptimize(p);

    auto begin1 = contiguous_iterator(a.data());
    auto end1   = contiguous_iterator(a.data() + a.size());
    auto begin2 = contiguous_iterator(p.data());
    auto end2   = contiguous_iterator(p.data() + p.size());

    benchmark::DoNotOptimize(std::ranges::equal(begin1, end1, begin2, end2));
  }
}
BENCHMARK(bm_starts_with_contiguous_iter_with_equal_impl)->RangeMultiplier(16)->Range(16, 16 << 20);

static void bm_starts_with_contiguous_iter_with_mismatch_impl(benchmark::State& state) {
  std::vector<int> a(state.range(), 1);
  std::vector<int> p(state.range(), 1);

  for (auto _ : state) {
    benchmark::DoNotOptimize(a);
    benchmark::DoNotOptimize(p);

    auto begin1 = contiguous_iterator(a.data());
    auto end1   = contiguous_iterator(a.data() + a.size());
    auto begin2 = contiguous_iterator(p.data());
    auto end2   = contiguous_iterator(p.data() + p.size());

    benchmark::DoNotOptimize(std::ranges::mismatch(begin1, end1, begin2, end2).in2 == end2);
  }
}
BENCHMARK(bm_starts_with_contiguous_iter_with_mismatch_impl)->RangeMultiplier(16)->Range(16, 16 << 20);

BENCHMARK_MAIN();

The performance is similar on MacBook Air (M1, arm64):

2024-04-26T17:14:19-04:00
Running ./build/libcxx/benchmarks/ranges_starts_with.libcxx.out
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 2.00, 3.59, 3.41
-----------------------------------------------------------------------------------------------------
Benchmark                                                           Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------
bm_starts_with_contiguous_iter_with_equal_impl/16                2.89 ns         2.86 ns    244051251
bm_starts_with_contiguous_iter_with_equal_impl/256               27.1 ns         26.1 ns     26963107
bm_starts_with_contiguous_iter_with_equal_impl/4096               451 ns          443 ns      1578169
bm_starts_with_contiguous_iter_with_equal_impl/65536             6693 ns         6667 ns       104706
bm_starts_with_contiguous_iter_with_equal_impl/1048576         110306 ns       109800 ns         6162
bm_starts_with_contiguous_iter_with_equal_impl/16777216       3129511 ns      2780193 ns          311
bm_starts_with_contiguous_iter_with_mismatch_impl/16             3.08 ns         3.05 ns    225504566
bm_starts_with_contiguous_iter_with_mismatch_impl/256            26.9 ns         26.7 ns     25911339
bm_starts_with_contiguous_iter_with_mismatch_impl/4096            422 ns          420 ns      1678504
bm_starts_with_contiguous_iter_with_mismatch_impl/65536          6834 ns         6722 ns       105055
bm_starts_with_contiguous_iter_with_mismatch_impl/1048576      124471 ns       123355 ns         5691
bm_starts_with_contiguous_iter_with_mismatch_impl/16777216    2337331 ns      2326288 ns          288

However, the performance is different on Arch Linux with a 4th Gen Xeon processor (x86_64):

2024-04-26T20:45:12+00:00
Running ./build/libcxx/benchmarks/ranges_starts_with.libcxx.out
Run on (4 X 2294.61 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 4096 KiB (x4)
Load Average: 0.00, 0.32, 0.66
-----------------------------------------------------------------------------------------------------
Benchmark                                                           Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------
bm_starts_with_contiguous_iter_with_equal_impl/16                7.68 ns         7.68 ns     93071510
bm_starts_with_contiguous_iter_with_equal_impl/256               31.4 ns         31.4 ns     25166061
bm_starts_with_contiguous_iter_with_equal_impl/4096               396 ns          396 ns      1528737
bm_starts_with_contiguous_iter_with_equal_impl/65536            10798 ns        10798 ns        59100
bm_starts_with_contiguous_iter_with_equal_impl/1048576         496691 ns       496671 ns         1499
bm_starts_with_contiguous_iter_with_equal_impl/16777216      13436051 ns     13435049 ns           50
bm_starts_with_contiguous_iter_with_mismatch_impl/16             10.7 ns         10.7 ns     68709479
bm_starts_with_contiguous_iter_with_mismatch_impl/256            59.0 ns         59.0 ns     10459829
bm_starts_with_contiguous_iter_with_mismatch_impl/4096           1069 ns         1069 ns       729445
bm_starts_with_contiguous_iter_with_mismatch_impl/65536         16881 ns        16880 ns        34519
bm_starts_with_contiguous_iter_with_mismatch_impl/1048576      583530 ns       583395 ns         1250
bm_starts_with_contiguous_iter_with_mismatch_impl/16777216   15792555 ns     15791353 ns           43

@philnik777 Thanks for the comment! Could you please review these benchmark results? If the result is valid, rather than focusing on optimizing std::ranges::starts_with, I guess we can further investigate the performance of std::ranges::mismatch on x86_64.

var-const · 2024-05-09T19:03:29Z

@philnik777 Friendly ping re. the benchmarks. :)

philnik777 · 2024-05-10T08:53:06Z

Sorry for being so slow on this. I had a lot to do recently. TL;DR it's complicated.

The long version: AVX2 seems to make quite a significant difference for mismatch. @xiaoyang-sde I guess you compiled without any -m flags? The glibc functions are heavily optimized for a lot of platforms, and chooses the implementation based on what CPU it's running on. I wasn't able to benchmark yet, but it seems that other implementations aren't optimized as well, making the performance improvement dependent on your compilation flags and what libc you're using (e.g. musl doesn't seem to have any optimized memcmp, which would make our implementation 40x faster). We would be able to do something similar to what glibc does with [[gnu::target_clones]] or [[gnu::target_version]], but I'm not sure it's worth the cost.

I think we have to have a more general discussion on how far we want to go for improved performance. For now I don't want to block this change, but I'd like us to have a more rigid understanding of how far we want to go before doing similar optimizations (i.e. selecting our vector code vs. the libc function).

xiaoyang-sde · 2024-05-10T12:30:26Z

Sorry for being so slow on this. I had a lot to do recently. TL;DR it's complicated.

The long version: AVX2 seems to make quite a significant difference for mismatch. @xiaoyang-sde I guess you compiled without any -m flags? The glibc functions are heavily optimized for a lot of platforms, and chooses the implementation based on what CPU it's running on. I wasn't able to benchmark yet, but it seems that other implementations aren't optimized as well, making the performance improvement dependent on your compilation flags and what libc you're using (e.g. musl doesn't seem to have any optimized memcmp, which would make our implementation 40x faster). We would be able to do something similar to what glibc does with [[gnu::target_clones]] or [[gnu::target_version]], but I'm not sure it's worth the cost.

I think we have to have a more general discussion on how far we want to go for improved performance. For now I don't want to block this change, but I'd like us to have a more rigid understanding of how far we want to go before doing similar optimizations (i.e. selecting our vector code vs. the libc function).

Thanks for the clarification! Whether this patch is merged or not isn't a concern for me, but I genuinely gained valuable knowledge from our discussion. I appreciate it and please take your time.

ldionne

I finally caught up to this patch. I tend to like the patch. I think it applies an optimization that should almost always be a win, except in cases where ranges::equal is slower than ranges::mismatch, which boils down to whether memcmp is faster than our implementation of mismatch. If I misunderstood something, please let me know.

Assuming I'm on the same page as the other reviewers, I would say this: comparing whether two ranges are equal is definitely "a simpler operation" (conceptually at least) than finding the point at which they don't match. So I think it makes sense to assume that ranges::equal should always be at least as fast as ranges::mismatch (in the worst case, I guess we could even implement ranges::equal by calling ranges::mismatch and checking that the result is the last iterator).

I understand that some C libraries might provide extremely poor quality of implementation for some functions, however I think that should be a bug filed on them and I don't think we should start deciding how to optimize our code based on that.

I have a few comments and I'd like to know if @philnik777 is fine with what I said above, but I'd say this patch otherwise LGTM.

ldionne · 2024-07-25T19:14:34Z

libcxx/include/__algorithm/ranges_starts_with.h

+            std::move(__first1),
+            ranges::next(__first1, __n2),


You'd have to compute ranges::next(__first1, __n2) before the function call if you want to move __first1. Otherwise you could end up using a moved-from iterator in the ranges::next call.

libcxx/include/__algorithm/ranges_starts_with.h

ldionne · 2024-10-01T20:01:22Z

@xiaoyang-sde There's a bit of review feedback to apply, but otherwise I think this is good to go. Do you think you're able to pick this up and make the last few requested changes?

xiaoyang-sde · 2024-10-01T22:15:36Z

@xiaoyang-sde There's a bit of review feedback to apply, but otherwise I think this is good to go. Do you think you're able to pick this up and make the last few requested changes?

I completely forgot about this! I'll respond to the comments shortly.

ldionne · 2025-03-27T14:55:26Z

When benchmarking locally, I get the following results for std::vector:

Comparing build/default/libcxx/test/benchmarks/algorithms/nonmodifying/Output/starts_with.bench.cpp.dir/benchmark-result.json to build/candidate/libcxx/test/benchmarks/algorithms/nonmodifying/Output/starts_with.bench.cpp.dir/benchmark-result.json
Benchmark                                               Time             CPU      Time Old      Time New       CPU Old       CPU New
------------------------------------------------------------------------------------------------------------------------------------
rng::starts_with(vector<int>)/8                      -0.0314         -0.0314             2             2             2             2
rng::starts_with(vector<int>)/1000                   +0.1724         +0.1719            99           116            99           115
rng::starts_with(vector<int>)/1024                   +0.1791         +0.1791           101           119           101           119
rng::starts_with(vector<int>)/8192                   +0.1894         +0.1893           821           976           821           976
rng::starts_with(vector<int>)/1048576                +0.0984         +0.0984        113321        124466        113308        124457
OVERALL_GEOMEAN                                      +5.1211         +5.1148             0             0             0             0

Using ranges::equal seems to be consistently worse than using ranges::mismatch.

Edit: Ah, it looks like #84570 (comment) and #84570 (comment) explain this result.

ldionne · 2025-03-27T15:00:21Z

Note that I don't think this will actually fix #83880 since ends_with uses starts_with with reverse iterators, and we don't have an optimization for equal on reverse iterators.

ldionne · 2025-03-27T15:29:14Z

@philnik777 What do you think the next step should be? Vectorize std::equal when we're on a platform where we know we can do better than memcmp?

Note that I think this patch makes sense in principle, but I also don't think we should check it in as-is since it regresses on some platforms. I'd even be fine with checking it in with some sort of #ifndef _LIBCPP_MISMATCH_FASTER_THAN_MEMCMP that we could fix in the near future. WDYT?

philnik777 · 2025-03-27T15:55:47Z

@ldionne I'm really not sure what to do. It's nice to piggy-back off of the memcmp optimizations the compiler can do, which means that we probably cannot compete with some implementations (the compiler can assume more about memcmp than it would about our implementation). OTOH we're doing better with our mismatch implementation than the system libc in some situations without the optimization barriers that would be required to match memcmp on all platforms. From a pure perofrmance stand-point, introducing our own __memcmp to be used everywhere instead of the libc one would probably be best. We could replace that with our own implementation if we want (though I fear people will piggy-back off of that for embedded platforms). I'm not sure how the maintainability of that would be though. Would we also introduce our own memmove if we can make a faster one?

ldionne · 2025-03-27T16:07:57Z

@ldionne I'm really not sure what to do. It's nice to piggy-back off of the memcmp optimizations the compiler can do, which means that we probably cannot compete with some implementations (the compiler can assume more about memcmp than it would about our implementation). OTOH we're doing better with our mismatch implementation than the system libc in some situations without the optimization barriers that would be required to match memcmp on all platforms. From a pure perofrmance stand-point, introducing our own __memcmp to be used everywhere instead of the libc one would probably be best. We could replace that with our own implementation if we want (though I fear people will piggy-back off of that for embedded platforms). I'm not sure how the maintainability of that would be though. Would we also introduce our own memmove if we can make a faster one?

IMO the answer is that if we can beat the platform's memcmp and memmove, we should implement our own. And we should also report a bug against any platform that we beat, but we should do it for the sake of libc++'s own performance. In particular, I find it unfortunate that this patch, which should obviously be an improvement, is actually a pessimization due to suboptimal memcmp on some platforms. I'd like to be able to unblock patches like this one.

philnik777 · 2025-04-02T13:15:49Z

In that case we should probably add our own wrappers of all the C functions and check whether they're slower than what we could easily achieve.

xiaoyang-sde mentioned this pull request Mar 8, 2024

[libc++] ranges::ends_with should forward to equal #83880

Open

xiaoyang-sde marked this pull request as ready for review March 9, 2024 08:14

xiaoyang-sde requested a review from a team as a code owner March 9, 2024 08:14

llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Mar 9, 2024

xiaoyang-sde commented Mar 9, 2024

View reviewed changes

var-const added ranges Issues related to `<ranges>` performance labels Mar 10, 2024

var-const self-assigned this Mar 10, 2024

philnik777 reviewed Mar 18, 2024

View reviewed changes

var-const reviewed Apr 2, 2024

View reviewed changes

libcxx/benchmarks/algorithms/ranges_starts_with.bench.cpp Outdated Show resolved Hide resolved

ldionne reviewed Apr 26, 2024

View reviewed changes

libcxx/include/__algorithm/ranges_starts_with.h Show resolved Hide resolved

libcxx/include/__algorithm/ranges_starts_with.h Show resolved Hide resolved

[libc++][ranges] optimize the performance of 'ranges::starts_with'

e5da968

xiaoyang-sde force-pushed the ranges_starts_with branch from 3866415 to e5da968 Compare April 26, 2024 15:40

ldionne approved these changes Apr 26, 2024

View reviewed changes

ldionne approved these changes Jul 25, 2024

View reviewed changes

Xiaoyang Liu and others added 4 commits October 7, 2024 22:11

[libc++][ranges] optimize the performance of 'ranges::starts_with'

3091e6d

[libc++][ranges] optimize the performance of 'ranges::starts_with'

1816cff

[libc++][ranges] optimize the performance of 'ranges::starts_with'

7c5f391

Merge branch 'main' into ranges_starts_with

ff816fb

Remove benchmark which now exists in main

f2b882d

ldionne changed the title ~~[libc++][ranges] optimize the performance of ranges::starts_with~~ [libc++][ranges] Optimize the performance of ranges::starts_with Mar 27, 2025

[libc++][ranges] Optimize the performance of ranges::starts_with #84570

Are you sure you want to change the base?

[libc++][ranges] Optimize the performance of ranges::starts_with #84570

Uh oh!

Conversation

xiaoyang-sde commented Mar 8, 2024 • edited by ldionne Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Abstract

Implementation

Benchmarking

Reference

Uh oh!

github-actions bot commented Mar 8, 2024

Uh oh!

llvmbot commented Mar 9, 2024

Abstract

Benchmarking

Uh oh!

xiaoyang-sde Mar 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

philnik777 Mar 18, 2024

Choose a reason for hiding this comment

Uh oh!

philnik777 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

philnik777 Mar 18, 2024

Choose a reason for hiding this comment

Uh oh!

var-const commented Apr 2, 2024

Uh oh!

Uh oh!

xiaoyang-sde commented Apr 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldionne left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ldionne left a comment

Choose a reason for hiding this comment

Uh oh!

philnik777 commented Apr 26, 2024

Uh oh!

xiaoyang-sde commented Apr 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

var-const commented May 9, 2024

Uh oh!

philnik777 commented May 10, 2024

Uh oh!

xiaoyang-sde commented May 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldionne left a comment

Choose a reason for hiding this comment

Uh oh!

ldionne Jul 25, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ldionne commented Oct 1, 2024

Uh oh!

xiaoyang-sde commented Oct 1, 2024

Uh oh!

ldionne commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldionne commented Mar 27, 2025

Uh oh!

ldionne commented Mar 27, 2025

Uh oh!

philnik777 commented Mar 27, 2025

Uh oh!

ldionne commented Mar 27, 2025

Uh oh!

[libc++][ranges] Optimize the performance of `ranges::starts_with` #84570

[libc++][ranges] Optimize the performance of `ranges::starts_with` #84570

xiaoyang-sde commented Mar 8, 2024 •

edited by ldionne

Loading

xiaoyang-sde Mar 9, 2024 •

edited

Loading

xiaoyang-sde commented Apr 8, 2024 •

edited

Loading

xiaoyang-sde commented Apr 26, 2024 •

edited

Loading

xiaoyang-sde commented May 10, 2024 •

edited

Loading

ldionne commented Mar 27, 2025 •

edited

Loading