a Zarith benchmark #10

gasche · 2021-06-30T20:25:39Z

Here is a toy benchmark for [@unboxed] inspired by the Zarith codebase. The idea is to compare four implementations of big-integers:

the Zarith one (I mentioned it used assembly code; it changed recently to use an OCaml implementation with Obj functions instead)
a simple approach with type boxed = Short of int | Long of t
the same with Short int [@unboxed]
a "wrong" approach using only integers (which overflows) to compare the performance

cc @nchataing

The results are very nice: the unboxed implementation generates exactly the same code, in the "short" path, as the Zarith implementation, so it is exactly as fast.

(function{test.ml:55,17-325} camlTest__add_zarith_287 (x/289: val y/290: val)
 (catch
   (if (and x/289 1)
     (if (and y/290 1)
       (let z/291 (+ (+ x/289 y/290) -1)
         (if (>= (and (or (xor z/291 x/289) 1) (or (xor z/291 y/290) 1)) 1)
           z/291
           (extcall "ml_z_add"{test.ml:61,11-20} x/289 y/290 int,int->val)))
       (exit 17))
     (exit 17))
 with(17) (extcall "ml_z_add"{test.ml:63,6-15} x/289 y/290 int,int->val)))

(function{test.ml:122,18-324} camlTest__add_unboxed_345
     (a/347: val b/348: val)
 (catch
   (if (and a/347 1)
     (if (and b/348 1)
       (let z/351 (+ (+ a/347 b/348) -1)
         (if (>= (and (or (xor z/351 a/347) 1) (or (xor z/351 b/348) 1)) 1)
           z/351
           (alloc{test.ml:129,13-51} 1024
             (extcall "ml_z_add"{test.ml:129,18-51} a/347 b/348 int,int->val))))
       (exit 8))
     (exit 8))
 with(8)
   (alloc{test.ml:130,14-50} 1024
     (extcall "ml_z_add"{test.ml:130,19-50}
       (if (and a/347 1) a/347 (load_mut val a/347))
       (if (and b/348 1) b/348 (load_mut val b/348)) int,int->val))))

(In the "slow" path where an overflow occurs, we have extra allocations in the "unboxed" version, which corresponds to the Long constructor. We could allow unboxing it if we supported shape constraints on abstract types, but this is not a priority.)

Benchmark timings (for bytecode and native-code compilation):

test.byte wrong-int:
2432902008176640000
0m3.186s

test.byte zarith:
2432902008176640000
0m4.593s

test.byte unboxed:
2432902008176640000
0m4.610s

test.byte boxed:
2432902008176640000
0m6.079s


test.native wrong-int:
2432902008176640000
0m6.624s

test.native zarith:
2432902008176640000
0m8.099s

test.native unboxed:
2432902008176640000
0m8.012s

test.native boxed:
2432902008176640000
0m8.777s

gasche · 2021-07-02T05:00:48Z

In the benchmark numbers given above, I was testing for n=20, which only ever uses "short" integers (you need "factorial 21" to overflow 63-bit integers). On @nchataing's requests, here are benchmark numbers for n=23, which spends iterations 22 and 23 with "long" numbers:

test.byte wrong-int:
-1095080418959949824
0m4.198s

test.byte zarith:
25852016738884976640000
0m7.123s

test.byte unboxed:
25852016738884976640000
0m9.170s

test.byte boxed:
25852016738884976640000
0m11.137s

test.native wrong-int:
-1095080418959949824
0m10.983s

test.native zarith:
25852016738884976640000
0m21.784s

test.native unboxed:
25852016738884976640000
0m22.624s

test.native boxed:
25852016738884976640000
0m23.146s

We can see in this result that the boxing of "long" integers in unboxed.ml does have an impact on performance, giving a solution that is intermediary between zarith.ml (no boxing whatsoever) and boxed.ml (both short and long numbers are boxed).

But let's keep in mind that the performance of these libraries on "short" numbers is way more important in practice for most use-cases.

…ocaml#84) This commit, which was part of PR#55, was lost when PR#55 was ported to 4.12. Partially revert "Replace tuple with record in Cextcall (#10)" This partially reverts commit 2cd07e649566a078246f4ad84369c467cbf52e11. Revert the changes to ocaml/testsuite/tools

…ject]]

This reverts commit 3a09ced4e0391f745755972d947faa6080af6370. The overflow-counting logic may change the performance profile, so it is simpler to not have it at all in normal benchmarking mode. The figure we computed is that with BENCH_SIZE=22, 16% of operations overflow.

…ustom

gasche · 2022-06-13T20:12:42Z

I implemented preliminary support for declaring restricted shapes for abstract FFI types, and used this to unbox the Long constructor as well in a new unboxed_both version of the benchmark. As expected, the performance is now within noise-distance of the original zarith implementation.

test.byte wrong-int:
-1250660718674968576
real	0m3.765s

test.byte zarith:
1124000727777607680000
real	0m6.004s

test.byte unboxed:
1124000727777607680000
real	0m6.342s

test.byte unboxed_both:
1124000727777607680000
real	0m5.903s

test.byte boxed:
1124000727777607680000
real	0m7.924s


test.native wrong-int:
-1250660718674968576
real	0m7.763s

test.native zarith:
1124000727777607680000
real	0m17.238s

test.native unboxed:
1124000727777607680000
real	0m19.124s

test.native unboxed_both:
1124000727777607680000
real	0m17.954s

test.native boxed:
1124000727777607680000
real	0m21.877s

gasche force-pushed the head_shape_zarith branch from e2b7de5 to 359b337 Compare July 9, 2021 08:34

nchataing force-pushed the head_shape branch 3 times, most recently from 51f6664 to 5674bfe Compare July 15, 2021 08:23

gasche force-pushed the head_shape branch from c7ae44b to 58260e3 Compare September 14, 2021 20:46

gasche pushed a commit that referenced this pull request Mar 28, 2022

flambda-backend: Replace tuple with record in Cextcall (#10)

b670bcf

gasche pushed a commit that referenced this pull request Mar 28, 2022

Improve inclusion error messages for [@local_opt] (#10)

30ce67d

gasche added 10 commits June 13, 2022 13:09

parsing/builtin_attributes: support code for list literals

2b777fc

parsing/builtin_attributes support for [@shape [int; double; lazy; ob…

98e405b

…ject]]

typedecl_unboxed: support for [@shape ...]

dd33134

fixup! document typedecl_unboxed.ml{,i}

5e26a45

import Zarith C code

e37aed4

import the Zarith-inspired benchmark code

22e2529

adapt the code to follow the paper presentation

92a7825

overflow counting logic

63d7e10

Zarith benchmark: include an unboxed_both version that also unboxes C…

bda7606

…ustom

gasche force-pushed the head_shape_zarith branch from 2f1f3e8 to bda7606 Compare June 13, 2022 20:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

a Zarith benchmark #10

a Zarith benchmark #10

Uh oh!

gasche commented Jun 30, 2021 •

edited

Loading

Uh oh!

gasche commented Jul 2, 2021 •

edited

Loading

Uh oh!

gasche commented Jun 13, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

a Zarith benchmark #10

Are you sure you want to change the base?

a Zarith benchmark #10

Uh oh!

Conversation

gasche commented Jun 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gasche commented Jul 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gasche commented Jun 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gasche commented Jun 30, 2021 •

edited

Loading

gasche commented Jul 2, 2021 •

edited

Loading

gasche commented Jun 13, 2022 •

edited

Loading