Make `ptr_rotate` faster

I was going to directly PR the performance improvement, but since this is my first time writing a bunch of unsafe code, and because `x.py` will not work for me (#61611), I decided to open an issue and point to [this repository](https://github.com/AaronKutch/ptr_rotate_bench_demonstration) filled with all the benchmarks I wrote.

I tried over 20 different variations on algorithms before settling on this one.
The only thing I am concerned with is that I am directly using `T` values in algorithm 1 instead of indirect copies and swaps, which the microbenchmarks did not like. I am pretty sure that it is panic safe, since there is no `T: clone` anywhere.

I benchmarked on both Intel and AMD CPUs, and found that only on the `x1031s149_new` benchmark did my algorithm perform significantly worse (probably due to the fact that 1031 and 149 are both primes), and the `u8_new` benchmark on my AMD CPU. I do not know what is causing the `u8_new` to be much worse on the new algorithm on my AMD CPU, and if the problem persists on newer AMD hardware. Every other benchmark indicates that my algorithm is as fast or faster than the old one (note that because there are so many benchmarks, there may be outliers which you should benchmark again).

Of course, when I do the PR I am not going to include all of those benchmarks, but I am not sure what subset to include.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make `ptr_rotate` faster #61784

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make ptr_rotate faster #61784

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Make `ptr_rotate` faster #61784