Description
Future features mentioned a nice set for instruction set future improvements. For example bswap
and bswap16
but missing bireverse
/bitrev
/rbit
operations which I think quite important due to it could be useful for wide range of compression algorithms, Cooley–Tukey's FFT, FHT, dyadic rational numbers and etc.
Clang has builtins for that:
__builtin_bitreverse8
__builtin_bitreverse16
__builtin_bitreverse32
__builtin_bitreverse64
which quite useful due to implement bit reversing is quite challenging and some architectures like ARM64 could be lowered to single instruction RBIT
.
So it would be interesting to hear your opinion on whether to add bitrev
to post-MVP?
Proposed unary instructions:
i32.bitrev8
i32.bitrev16
i32.bitrev
i64.bitrev
Implementation details for runtimes
For x64 / ARMv7 / RISCV / MIPS / POWER and etc:
- i32.bitrev8:
one read from 8-bit lookup table - i32.bitrev16:
two reads from 8-bit lookup table - i32.bitrev:
four reads from 8-bit lookup table - i64.bitrev:
use Knuth's method from Hacker's delight or 8 lookups if it faster on target architecture.
For more modern x64 architecture like Zen3 and Skylake bit reversing could be implemented via PDEP / PEXT instructions.
For ARM64:
- i32.bitrev8:
rbit w1, w0 lsr w0, w1, #24
- i32.bitrev16:
rbit w1, w0 lsr w0, w1, #16
- i32.bitrev:
rbit w0, w1
- i64.bitrev:
rbit x0, x1