Rejection Sampling in NEON
- Run
make
to build all - Run
make bench
to start benchmarking cycles count. Please readpapi_hl_output
to see the result. ThePAPI_TOT_CYC
field describe the total clock cycles over a number ofTESTS
iteration. - Run
make verify
to verify the vectorize implementation compare with reference code.
This code is developed and tested on Raspberry Pi 4 8Gb Cortex-A72.
C | NEON-Full | Ratio | NEON-Half | Ratio | NEON-Mix | Ratio | |
---|---|---|---|---|---|---|---|
Rejection Sampling | 1686 | 773 | 2.18 | 1250 | 1.34 | 765 | 2.20 |