Commit 786edcc
committed
pp_reverse - chunk-at-a-time string reversal
The performance characteristics of string reversal in blead is very
variable depending upon the capabilities of the C compiler. Some
compilers are able to vectorize some cases for better performance.
This commit introduces explicit reversal and swapping of whole
registers at a time, which all builds seem to be able to
benefit from.
The `_swab_xx_` macros for doing this already exist in perl.h,
using them for this purpose was inspired by
https://dev.to/wunk/fast-array-reversal-with-simd-j3p
The bit shifting done by these macros should be portable and reasonably
performant if not optimised further, but it is likely that they will
be optimised to bswap, rev, movbe instructions.
Some performance comparisons:
1. Large string reversal, with different source & destination buffers
my $x = "X"x(1024*1000*10); my $y; for (0..1_000) { $y = reverse $x }
gcc blead:
2,388.30 msec task-clock # 0.993 CPUs utilized
10,574,195,388 cycles # 4.427 GHz
61,520,672,268 instructions # 5.82 insn per cycle
10,255,049,869 branches # 4.294 G/sec
clang blead:
688.37 msec task-clock # 0.946 CPUs utilized
3,161,754,439 cycles # 4.593 GHz
8,986,420,860 instructions # 2.84 insn per cycle
324,734,391 branches # 471.745 M/sec
gcc patched:
408.39 msec task-clock # 0.936 CPUs utilized
1,617,273,653 cycles # 3.960 GHz
6,422,991,675 instructions # 3.97 insn per cycle
644,856,283 branches # 1.579 G/sec
clang patched:
397.61 msec task-clock # 0.924 CPUs utilized
1,655,838,316 cycles # 4.165 GHz
5,782,487,237 instructions # 3.49 insn per cycle
324,586,437 branches # 816.350 M/sec
2. Large string reversal, but reversing the buffer in-place
my $x = "X"x(1024*1000*10); my $y; for (0..1_000) { $y = reverse "foo",$x }
gcc blead:
6,038.06 msec task-clock # 0.996 CPUs utilized
27,109,273,840 cycles # 4.490 GHz
41,987,097,139 instructions # 1.55 insn per cycle
5,211,350,347 branches # 863.083 M/sec
clang blead:
5,815.86 msec task-clock # 0.995 CPUs utilized
26,962,768,616 cycles # 4.636 GHz
47,111,208,664 instructions # 1.75 insn per cycle
5,211,117,921 branches # 896.018 M/sec
gcc patched:
1,003.49 msec task-clock # 0.999 CPUs utilized
4,298,242,624 cycles # 4.283 GHz
7,387,822,303 instructions # 1.72 insn per cycle
725,892,855 branches # 723.367 M/sec
clang patched:
970.78 msec task-clock # 0.973 CPUs utilized
4,436,489,695 cycles # 4.570 GHz
8,028,374,567 instructions # 1.81 insn per cycle
725,867,979 branches # 747.713 M/sec
3. Short string reversal, different source & destination (checking performance on
smaller string reversals - note: this one's vary variable due to noise)
my $x = "1234567"; my $y; for (0..10_000_000) { $y = reverse $x }
gcc blead:
401.20 msec task-clock # 0.916 CPUs utilized
1,672,263,966 cycles # 4.168 GHz
5,564,078,603 instructions # 3.33 insn per cycle
1,250,983,219 branches # 3.118 G/sec
clang blead:
380.58 msec task-clock # 0.998 CPUs utilized
1,615,634,265 cycles # 4.245 GHz
5,583,854,366 instructions # 3.46 insn per cycle
1,300,935,443 branches # 3.418 G/sec
gcc patched:
381.62 msec task-clock # 0.999 CPUs utilized
1,566,807,988 cycles # 4.106 GHz
5,474,069,670 instructions # 3.49 insn per cycle
1,240,983,221 branches # 3.252 G/sec
clang patched:
346.21 msec task-clock # 0.999 CPUs utilized
1,600,780,787 cycles # 4.624 GHz
5,493,773,623 instructions # 3.43 insn per cycle
1,270,915,076 branches # 3.671 G/sec1 parent 6a4f62c commit 786edcc
1 file changed
+133
-18
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6529 | 6529 | | |
6530 | 6530 | | |
6531 | 6531 | | |
6532 | | - | |
6533 | 6532 | | |
6534 | 6533 | | |
6535 | 6534 | | |
| |||
6555 | 6554 | | |
6556 | 6555 | | |
6557 | 6556 | | |
6558 | | - | |
6559 | | - | |
| 6557 | + | |
| 6558 | + | |
| 6559 | + | |
| 6560 | + | |
6560 | 6561 | | |
6561 | 6562 | | |
6562 | 6563 | | |
6563 | 6564 | | |
6564 | 6565 | | |
6565 | 6566 | | |
6566 | | - | |
| 6567 | + | |
6567 | 6568 | | |
6568 | 6569 | | |
6569 | 6570 | | |
| |||
6644 | 6645 | | |
6645 | 6646 | | |
6646 | 6647 | | |
| 6648 | + | |
6647 | 6649 | | |
6648 | 6650 | | |
| 6651 | + | |
6649 | 6652 | | |
6650 | 6653 | | |
6651 | | - | |
6652 | | - | |
| 6654 | + | |
| 6655 | + | |
6653 | 6656 | | |
6654 | | - | |
6655 | | - | |
| 6657 | + | |
| 6658 | + | |
6656 | 6659 | | |
6657 | 6660 | | |
6658 | | - | |
| 6661 | + | |
6659 | 6662 | | |
6660 | 6663 | | |
6661 | 6664 | | |
| |||
6679 | 6682 | | |
6680 | 6683 | | |
6681 | 6684 | | |
| 6685 | + | |
| 6686 | + | |
| 6687 | + | |
| 6688 | + | |
6682 | 6689 | | |
6683 | | - | |
6684 | | - | |
6685 | | - | |
| 6690 | + | |
| 6691 | + | |
| 6692 | + | |
| 6693 | + | |
| 6694 | + | |
| 6695 | + | |
| 6696 | + | |
| 6697 | + | |
| 6698 | + | |
| 6699 | + | |
| 6700 | + | |
| 6701 | + | |
| 6702 | + | |
| 6703 | + | |
| 6704 | + | |
| 6705 | + | |
| 6706 | + | |
| 6707 | + | |
| 6708 | + | |
| 6709 | + | |
| 6710 | + | |
| 6711 | + | |
| 6712 | + | |
| 6713 | + | |
| 6714 | + | |
| 6715 | + | |
| 6716 | + | |
| 6717 | + | |
| 6718 | + | |
| 6719 | + | |
| 6720 | + | |
| 6721 | + | |
| 6722 | + | |
| 6723 | + | |
| 6724 | + | |
| 6725 | + | |
| 6726 | + | |
| 6727 | + | |
| 6728 | + | |
| 6729 | + | |
| 6730 | + | |
| 6731 | + | |
| 6732 | + | |
| 6733 | + | |
| 6734 | + | |
| 6735 | + | |
| 6736 | + | |
| 6737 | + | |
| 6738 | + | |
| 6739 | + | |
| 6740 | + | |
| 6741 | + | |
| 6742 | + | |
| 6743 | + | |
| 6744 | + | |
| 6745 | + | |
| 6746 | + | |
| 6747 | + | |
6686 | 6748 | | |
6687 | 6749 | | |
6688 | 6750 | | |
| |||
6695 | 6757 | | |
6696 | 6758 | | |
6697 | 6759 | | |
6698 | | - | |
6699 | 6760 | | |
| 6761 | + | |
6700 | 6762 | | |
6701 | 6763 | | |
6702 | 6764 | | |
| |||
6720 | 6782 | | |
6721 | 6783 | | |
6722 | 6784 | | |
6723 | | - | |
6724 | | - | |
6725 | | - | |
6726 | | - | |
6727 | | - | |
| 6785 | + | |
| 6786 | + | |
| 6787 | + | |
| 6788 | + | |
| 6789 | + | |
| 6790 | + | |
| 6791 | + | |
| 6792 | + | |
| 6793 | + | |
| 6794 | + | |
| 6795 | + | |
| 6796 | + | |
| 6797 | + | |
| 6798 | + | |
| 6799 | + | |
| 6800 | + | |
| 6801 | + | |
| 6802 | + | |
| 6803 | + | |
| 6804 | + | |
| 6805 | + | |
| 6806 | + | |
| 6807 | + | |
| 6808 | + | |
| 6809 | + | |
| 6810 | + | |
| 6811 | + | |
| 6812 | + | |
| 6813 | + | |
| 6814 | + | |
| 6815 | + | |
| 6816 | + | |
| 6817 | + | |
| 6818 | + | |
| 6819 | + | |
| 6820 | + | |
| 6821 | + | |
| 6822 | + | |
| 6823 | + | |
| 6824 | + | |
| 6825 | + | |
| 6826 | + | |
| 6827 | + | |
| 6828 | + | |
| 6829 | + | |
| 6830 | + | |
| 6831 | + | |
| 6832 | + | |
| 6833 | + | |
| 6834 | + | |
| 6835 | + | |
| 6836 | + | |
| 6837 | + | |
| 6838 | + | |
| 6839 | + | |
| 6840 | + | |
| 6841 | + | |
| 6842 | + | |
6728 | 6843 | | |
6729 | 6844 | | |
6730 | 6845 | | |
| |||
0 commit comments