Optimize scalar secret access #870

easyaspi314 · 2023-07-17T03:22:28Z

So notably XXH3 assumes you got unaligned access, as it has become kinda standard on modern processors. However, it doesn't have to be as bad as it is for strict alignment targets.

With RISC-V gaining traction and NOT having unaligned access, we have a modern 64-bit target that will be running scalar with strict alignment.

One thing that could be clearly optimized is the secret access.

When we remove SIMD and look at XXH3 on an 8 byte granularity, we can see that we have a FIFO queue. (Half scale)

| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
[   stripe 1    ]
    [    stripe 2   ]
        [    stripe 3   ]
            [    stripe 4   ]
                [    stripe 5   ]

We could create a ring buffer of ready-to-use values:

XXH3_accumulate(..., xsecret)
{
    u64 secret_buffer[8];
    int secret_buffer_pos = 0;
    for (i = 0; i < 7; i++) { // one less
        secret_buffer[i] = XXH_readLE64(xsecret + 8 * i);
     }
     for each stripe {
         secret_buffer[(secret_buffer_pos + 7) % 8] = XXH_readLE64(xsecret + ...);
         XXH3_accumulate_512(...);
         secret_buffer_pos = (secret_buffer_pos + 1) % 8;
     }
}

This could be the secret (heh) to a possible massive performance increase on strict alignment scalar without an O(n) buffer, as with 8 byte granularity, the overlapping reads are entirely avoidable, and we have 1/8 the unaligned accesses at the cost of a ring buffer which could possibly be done in registers as well (but that would be detrimental to dumber compilers).

This would only make sense on strict alignment/big endian with slow swaps.

The text was updated successfully, but these errors were encountered:

sh1boot · 2023-09-30T18:58:05Z

I believe RISC-V implementations are allowed unaligned access, so be careful how such code is enabled, so as to not slow down the cores that support unaligned.

Cyan4973 added the enhancement label Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize scalar secret access #870

Optimize scalar secret access #870

easyaspi314 commented Jul 17, 2023 •

edited

Loading

sh1boot commented Sep 30, 2023

Optimize scalar secret access #870

Optimize scalar secret access #870

Comments

easyaspi314 commented Jul 17, 2023 • edited Loading

sh1boot commented Sep 30, 2023

easyaspi314 commented Jul 17, 2023 •

edited

Loading