Closed
Description
Mandel-rust benchmark produces the following results:
https://gist.github.com/petevine/b70b6e5a434f23b40ab5
TL;DR
32-bit code performance looks like this:
P2(3) > Core2 > P4 (x86_64 too)
P2(3) being the only ones to scale on 2 cores in all benchmarks.
It's either a sign of LLVM being buggy or I was more right about P4 codegen producing suboptimal code than I'd ever suspected. (x86_64 is affected too so it could be something else though)
Naturally, the common theme could be the use of SSE2 which is absent from the fastest code:
Configuration: re1: -2.00, re2: 1.00, img1: -1.50, img2: 1.50, max_iter: 2048, img_size: 1024, num_threads: 2
Time taken for this run (serial): 2469.21302 ms
Time taken for this run (scoped_thread_pool): 1248.45883 ms
Time taken for this run (simple_parallel): 1284.73761 ms
Time taken for this run (rayon_join): 1246.36625 ms
Time taken for this run (rayon_par_iter): 1337.93075 ms
Time taken for this run (rust_scoped_pool): 1240.33273 ms
Time taken for this run (job_steal): 1241.20777 ms
Time taken for this run (job_steal_join): 1246.34885 ms
Time taken for this run (kirk_crossbeam): 1244.10723 ms