Skip to content

Conversation

@gasche
Copy link
Owner

@gasche gasche commented Jun 30, 2021

Here is a toy benchmark for [@unboxed] inspired by the Zarith codebase. The idea is to compare four implementations of big-integers:

  • the Zarith one (I mentioned it used assembly code; it changed recently to use an OCaml implementation with Obj functions instead)
  • a simple approach with type boxed = Short of int | Long of t
  • the same with Short int [@unboxed]
  • a "wrong" approach using only integers (which overflows) to compare the performance

cc @nchataing

The results are very nice: the unboxed implementation generates exactly the same code, in the "short" path, as the Zarith implementation, so it is exactly as fast.

(function{test.ml:55,17-325} camlTest__add_zarith_287 (x/289: val y/290: val)
 (catch
   (if (and x/289 1)
     (if (and y/290 1)
       (let z/291 (+ (+ x/289 y/290) -1)
         (if (>= (and (or (xor z/291 x/289) 1) (or (xor z/291 y/290) 1)) 1)
           z/291
           (extcall "ml_z_add"{test.ml:61,11-20} x/289 y/290 int,int->val)))
       (exit 17))
     (exit 17))
 with(17) (extcall "ml_z_add"{test.ml:63,6-15} x/289 y/290 int,int->val)))

(function{test.ml:122,18-324} camlTest__add_unboxed_345
     (a/347: val b/348: val)
 (catch
   (if (and a/347 1)
     (if (and b/348 1)
       (let z/351 (+ (+ a/347 b/348) -1)
         (if (>= (and (or (xor z/351 a/347) 1) (or (xor z/351 b/348) 1)) 1)
           z/351
           (alloc{test.ml:129,13-51} 1024
             (extcall "ml_z_add"{test.ml:129,18-51} a/347 b/348 int,int->val))))
       (exit 8))
     (exit 8))
 with(8)
   (alloc{test.ml:130,14-50} 1024
     (extcall "ml_z_add"{test.ml:130,19-50}
       (if (and a/347 1) a/347 (load_mut val a/347))
       (if (and b/348 1) b/348 (load_mut val b/348)) int,int->val))))

(In the "slow" path where an overflow occurs, we have extra allocations in the "unboxed" version, which corresponds to the Long constructor. We could allow unboxing it if we supported shape constraints on abstract types, but this is not a priority.)

Benchmark timings (for bytecode and native-code compilation):

test.byte wrong-int:
2432902008176640000
0m3.186s

test.byte zarith:
2432902008176640000
0m4.593s

test.byte unboxed:
2432902008176640000
0m4.610s

test.byte boxed:
2432902008176640000
0m6.079s


test.native wrong-int:
2432902008176640000
0m6.624s

test.native zarith:
2432902008176640000
0m8.099s

test.native unboxed:
2432902008176640000
0m8.012s

test.native boxed:
2432902008176640000
0m8.777s

@gasche
Copy link
Owner Author

gasche commented Jul 2, 2021

In the benchmark numbers given above, I was testing for n=20, which only ever uses "short" integers (you need "factorial 21" to overflow 63-bit integers). On @nchataing's requests, here are benchmark numbers for n=23, which spends iterations 22 and 23 with "long" numbers:

test.byte wrong-int:
-1095080418959949824
0m4.198s

test.byte zarith:
25852016738884976640000
0m7.123s

test.byte unboxed:
25852016738884976640000
0m9.170s

test.byte boxed:
25852016738884976640000
0m11.137s

test.native wrong-int:
-1095080418959949824
0m10.983s

test.native zarith:
25852016738884976640000
0m21.784s

test.native unboxed:
25852016738884976640000
0m22.624s

test.native boxed:
25852016738884976640000
0m23.146s

We can see in this result that the boxing of "long" integers in unboxed.ml does have an impact on performance, giving a solution that is intermediary between zarith.ml (no boxing whatsoever) and boxed.ml (both short and long numbers are boxed).

But let's keep in mind that the performance of these libraries on "short" numbers is way more important in practice for most use-cases.

@gasche gasche force-pushed the head_shape_zarith branch from e2b7de5 to 359b337 Compare July 9, 2021 08:34
@nchataing nchataing force-pushed the head_shape branch 3 times, most recently from 51f6664 to 5674bfe Compare July 15, 2021 08:23
gasche pushed a commit that referenced this pull request Mar 28, 2022
…ocaml#84)

This commit, which was part of PR#55, was lost when PR#55
was ported to 4.12.

Partially revert "Replace tuple with record in Cextcall (#10)"

This partially reverts commit 2cd07e649566a078246f4ad84369c467cbf52e11.

Revert the changes to ocaml/testsuite/tools
@gasche gasche force-pushed the head_shape_zarith branch from 2f1f3e8 to bda7606 Compare June 13, 2022 20:08
@gasche
Copy link
Owner Author

gasche commented Jun 13, 2022

I implemented preliminary support for declaring restricted shapes for abstract FFI types, and used this to unbox the Long constructor as well in a new unboxed_both version of the benchmark. As expected, the performance is now within noise-distance of the original zarith implementation.

test.byte wrong-int:
-1250660718674968576
real	0m3.765s

test.byte zarith:
1124000727777607680000
real	0m6.004s

test.byte unboxed:
1124000727777607680000
real	0m6.342s

test.byte unboxed_both:
1124000727777607680000
real	0m5.903s

test.byte boxed:
1124000727777607680000
real	0m7.924s


test.native wrong-int:
-1250660718674968576
real	0m7.763s

test.native zarith:
1124000727777607680000
real	0m17.238s

test.native unboxed:
1124000727777607680000
real	0m19.124s

test.native unboxed_both:
1124000727777607680000
real	0m17.954s

test.native boxed:
1124000727777607680000
real	0m21.877s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants