Skip to content

Commit 0e02ca2

Browse files
visitorckwakpm00
authored andcommitted
lib/sort: optimize heapsort with double-pop variation
Instead of popping only the maximum element from the heap during each iteration, we now pop the two largest elements at once. Although this introduces an additional comparison to determine the second largest element, it enables a reduction in the height of the tree by one during the heapify operations starting from root's left/right child. This reduction in tree height by one leads to a decrease of one comparison and one swap. This optimization results in saving approximately 0.5 * n swaps without increasing the number of comparisons. Additionally, the heap size during heapify is now one less than the original size, offering a chance for further reduction in comparisons and swaps. The following experimental data is based on the array generated using get_random_u32(). | N | swaps (old) | swaps (new) | comparisons (old) | comparisons (new) | |-------|-------------|-------------|-------------------|-------------------| | 1000 | 9054 | 8569 | 10328 | 10320 | | 2000 | 20137 | 19182 | 22634 | 22587 | | 3000 | 32062 | 30623 | 35833 | 35752 | | 4000 | 44274 | 42282 | 49332 | 49306 | | 5000 | 57195 | 54676 | 63300 | 63294 | | 6000 | 70205 | 67202 | 77599 | 77557 | | 7000 | 83276 | 79831 | 92113 | 92032 | | 8000 | 96630 | 92678 | 106635 | 106617 | | 9000 | 110349 | 105883 | 121505 | 121404 | | 10000 | 124165 | 119202 | 136628 | 136617 | Link: https://lkml.kernel.org/r/20240113031352.2395118-3-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: George Spelvin <lkml@sdf.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
1 parent db946a4 commit 0e02ca2

File tree

1 file changed

+14
-4
lines changed

1 file changed

+14
-4
lines changed

lib/sort.c

+14-4
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,7 @@ void sort_r(void *base, size_t num, size_t size,
215215
/* pre-scale counters for performance */
216216
size_t n = num * size, a = (num/2) * size;
217217
const unsigned int lsbit = size & -size; /* Used to find parent */
218+
size_t shift = 0;
218219

219220
if (!a) /* num < 2 || size == 0 */
220221
return;
@@ -242,12 +243,21 @@ void sort_r(void *base, size_t num, size_t size,
242243
for (;;) {
243244
size_t b, c, d;
244245

245-
if (a) /* Building heap: sift down --a */
246-
a -= size;
247-
else if (n -= size) /* Sorting: Extract root to --n */
246+
if (a) /* Building heap: sift down a */
247+
a -= size << shift;
248+
else if (n > 3 * size) { /* Sorting: Extract two largest elements */
249+
n -= size;
248250
do_swap(base, base + n, size, swap_func, priv);
249-
else /* Sort complete */
251+
shift = do_cmp(base + size, base + 2 * size, cmp_func, priv) <= 0;
252+
a = size << shift;
253+
n -= size;
254+
do_swap(base + a, base + n, size, swap_func, priv);
255+
} else if (n > size) { /* Sorting: Extract root */
256+
n -= size;
257+
do_swap(base, base + n, size, swap_func, priv);
258+
} else { /* Sort complete */
250259
break;
260+
}
251261

252262
/*
253263
* Sift element at "a" down into heap. This is the

0 commit comments

Comments
 (0)