Skip to content

Make ptr_rotate faster #61784

Closed
Closed
@AaronKutch

Description

@AaronKutch

I was going to directly PR the performance improvement, but since this is my first time writing a bunch of unsafe code, and because x.py will not work for me (#61611), I decided to open an issue and point to this repository filled with all the benchmarks I wrote.

I tried over 20 different variations on algorithms before settling on this one.
The only thing I am concerned with is that I am directly using T values in algorithm 1 instead of indirect copies and swaps, which the microbenchmarks did not like. I am pretty sure that it is panic safe, since there is no T: clone anywhere.

I benchmarked on both Intel and AMD CPUs, and found that only on the x1031s149_new benchmark did my algorithm perform significantly worse (probably due to the fact that 1031 and 149 are both primes), and the u8_new benchmark on my AMD CPU. I do not know what is causing the u8_new to be much worse on the new algorithm on my AMD CPU, and if the problem persists on newer AMD hardware. Every other benchmark indicates that my algorithm is as fast or faster than the old one (note that because there are so many benchmarks, there may be outliers which you should benchmark again).

Of course, when I do the PR I am not going to include all of those benchmarks, but I am not sure what subset to include.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-enhancementCategory: An issue proposing an enhancement or a PR with one.I-slowIssue: Problems and improvements with respect to performance of generated code.T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions