ext/bcmath: Use divide and conquer method in `bcmul` #14376

SakiTakamachi · 2024-05-30T15:26:12Z

It does not significantly affect the speed of computations that do not use divide and conquer, making computations with large numbers of digits faster.

I was wondering if I could make the commits smaller somehow, but since all changes depend on each other, I couldn't separate the commits...

There are some places where SIMD could be used, but I won't include it in this PR because it would make the PR too complicated.

Benchmarks

1:

$num1 = '1.2345678';
$num2 = '2.1234567';

for ($i = 0; $i < 5000000; $i++) {
    bcmul($num1, $num2, 7);
}

2:

$num1 = '1.2345678901234567890';
$num2 = '2.12345678901234567890';

for ($i = 0; $i < 3000000; $i++) {
    bcmul($num1, $num2, 20);
}

3:

$num1 = str_repeat('1234567890', 300);
$num2 = str_repeat('9876543210', 300);

for ($i = 0; $i < 6000; $i++) {
    bcmul($num1, $num2, 0);
}

4:

$num1 = str_repeat('1234567890', 1024);
$num2 = str_repeat('9876543210', 1024);

for ($i = 0; $i < 500; $i++) {
    bcmul($num1, $num2, 0);
}

before

# hyperfine "php /bc/mul/1.php" --warmup 10
Benchmark 1: php /bc/mul/1.php
  Time (mean ± σ):     611.1 ms ±   3.3 ms    [User: 605.5 ms, System: 4.8 ms]
  Range (min … max):   607.7 ms … 619.1 ms    10 runs
 
# hyperfine "php /bc/mul/2.php" --warmup 10
Benchmark 1: php /bc/mul/2.php
  Time (mean ± σ):     507.5 ms ±   4.4 ms    [User: 503.1 ms, System: 3.5 ms]
  Range (min … max):   503.3 ms … 518.3 ms    10 runs
 
# hyperfine "php /bc/mul/3.php" --warmup 10
Benchmark 1: php /bc/mul/3.php
  Time (mean ± σ):     537.0 ms ±   7.3 ms    [User: 533.4 ms, System: 2.8 ms]
  Range (min … max):   528.3 ms … 549.0 ms    10 runs
 
# hyperfine "php /bc/mul/4.php" --warmup 10
Benchmark 1: php /bc/mul/4.php
  Time (mean ± σ):     476.2 ms ±   5.0 ms    [User: 471.6 ms, System: 3.8 ms]
  Range (min … max):   470.7 ms … 487.4 ms    10 runs

after

# hyperfine "php /bc/mul/1.php" --warmup 10
Benchmark 1: php /bc/mul/1.php
  Time (mean ± σ):     610.4 ms ±   8.1 ms    [User: 606.1 ms, System: 2.5 ms]
  Range (min … max):   601.6 ms … 628.6 ms    10 runs
 
# hyperfine "php /bc/mul/2.php" --warmup 10
Benchmark 1: php /bc/mul/2.php
  Time (mean ± σ):     502.7 ms ±   4.4 ms    [User: 498.6 ms, System: 3.2 ms]
  Range (min … max):   496.4 ms … 509.2 ms    10 runs
 
# hyperfine "php /bc/mul/3.php" --warmup 10
Benchmark 1: php /bc/mul/3.php
  Time (mean ± σ):     450.4 ms ±  14.3 ms    [User: 446.5 ms, System: 3.1 ms]
  Range (min … max):   433.8 ms … 474.1 ms    10 runs
 
# hyperfine "php /bc/mul/4.php" --warmup 10
Benchmark 1: php /bc/mul/4.php
  Time (mean ± σ):     292.2 ms ±   6.9 ms    [User: 287.3 ms, System: 4.1 ms]
  Range (min … max):   285.7 ms … 307.3 ms    10 runs

SakiTakamachi · 2024-05-31T05:54:47Z

There was a calculation error in memory management, so will fix it.

SakiTakamachi · 2024-05-31T14:37:24Z

done

nielsdos

Interesting but also very complex.
I also think we need good dedicated test cases for this.

nielsdos · 2024-06-01T13:50:48Z

ext/bcmath/libbcmath/src/recmul.c

+ *
+ * As can see by actually doing it, the value added to [3] in this example is the
+ * accumulation of all (high + low - mid) when calculating 2 digits.
+ * In this example, the ret size is 8, so the calculation length is 4, and from 4^1.585,


~~How do you obtain the number 1.585?~~
I see, this is the time complexity O(N^log2(3))... That wasn't clear to me

I'll elaborate a bit more in my comments.

nielsdos · 2024-06-01T13:51:07Z

ext/bcmath/libbcmath/src/recmul.c

+ *
+ * At this time, if consider one portion (high + low - mid) using ab*cd as an example,
+ * this becomes: ac + bd - (a-b)(c-d) = ac + bd - ac + ad + bc - bd = ad + bc
+ * Can see that the maximum value of this is obtained when 99*99.


Suggested change

* Can see that the maximum value of this is obtained when 99*99.

* Can see that the maximum value of this is obtained by 99*99.

nielsdos · 2024-06-01T13:51:42Z

ext/bcmath/libbcmath/src/recmul.c

+ * this becomes: ac + bd - (a-b)(c-d) = ac + bd - ac + ad + bc - bd = ad + bc
+ * Can see that the maximum value of this is obtained when 99*99.
+ *
+ * This law holds true regardless of the calculation length, so when considering the


afaik "holds" already means it's true

Suggested change

* This law holds true regardless of the calculation length, so when considering the

* This law holds regardless of the calculation length, so when considering the

nielsdos · 2024-06-01T14:08:53Z

ext/bcmath/libbcmath/src/recmul.c

+ * BC_MUL_MAX_ADD_COUNT (because the calculation length is always adjusted to the power of 2).
+ */
+#if SIZEOF_SIZE_T >= 8
+#  define BC_REC_MUL_DO_ADJUST_EXPO 1024


Right, so because BC_MUL_MAX_ADD_COUNT is around 1844 we go to the power of 2 below it, i.e. 1024. I think you should add somewhere that BC_MUL_MAX_ADD_COUNT is around 1844.

nielsdos · 2024-06-01T14:09:24Z

ext/bcmath/libbcmath/src/recmul.c

+ */
+#if SIZEOF_SIZE_T >= 8
+#  define BC_REC_MUL_DO_ADJUST_EXPO 1024
+#  define BC_USE_REC_MUL_DIGITS 160 * 8


How did you arrive at 160?

I simply compared the measurements with the standard version and specified the number at which rec was faster. But I forgot to include the results of that comparison...

nielsdos · 2024-06-01T15:29:05Z

ext/bcmath/libbcmath/src/recmul.c

+	 * reaches 2. In other words, it is the sum of a geometric progression of 2 with a geometric
+	 * ratio of 2 from 2 to N-1, where the calculation length is N.
+	 * This sum can be calculated using the following formula, where the first term is a and the
+	 * geometric ratio is r: a(r^(N-1) - 1)(r - 1)


Suggested change

* geometric ratio is r: a(r^(N-1) - 1)(r - 1)

* geometric ratio is r: a(r^(N-1) - 1)/(r - 1)

nielsdos · 2024-06-01T15:29:29Z

ext/bcmath/libbcmath/src/recmul.c

+	 * ratio of 2 from 2 to N-1, where the calculation length is N.
+	 * This sum can be calculated using the following formula, where the first term is a and the
+	 * geometric ratio is r: a(r^(N-1) - 1)(r - 1)
+	 * Here, a and r are both 2, so this formula becomes: 2(2^(N-1) - 1)(2 - 1) = 2^N - 2


Suggested change

* Here, a and r are both 2, so this formula becomes: 2(2^(N-1) - 1)(2 - 1) = 2^N - 2

* Here, a and r are both 2, so this formula becomes: 2(2^(N-1) - 1)/(2 - 1) = 2^N - 2

Why are a and r both 2? I would've expected a = M and r = 1/2

It was intended that rearranging the numbers in reverse would result in a sequence like the comments, but the format you've written may be easier to understand.

nielsdos · 2024-06-01T15:33:43Z

ext/bcmath/libbcmath/src/recmul.c

+	 * Share the results and the buffers used in intermediate calculations. The result is prod_arr_size.
+	 * The buffer increases by half like n_buf, but when calc_size is 2, the required buffer size is 4.
+	 * In other words, they are almost the same geometric progression, but the first term is 4.
+	 * a(r^(N-1) - 1)(r - 1): 4(2^(N-1) - 1)(2 - 1) = 2^N - 2 = 2(2^N - 2)


Suggested change

* a(r^(N-1) - 1)(r - 1): 4(2^(N-1) - 1)(2 - 1) = 2^N - 2 = 2(2^N - 2)

* a(r^(N-1) - 1)/(r - 1): 4(2^(N-1) - 1)/(2 - 1) = 2^N - 2 = 2(2^N - 2)

nielsdos · 2024-06-01T15:34:10Z

ext/bcmath/libbcmath/src/recmul.c

+	n1_buf_size -= 2;
+	n2_buf_size -= 2;
+
+	BC_UINT_T *buf = safe_emalloc(prod_arr_size * 2 + calc_size * 2 - 4 + n1_buf_size + n2_buf_size, sizeof(BC_UINT_T), 0);


Can this sum overflow?

You are right. For very large numbers it will overflow...

nielsdos · 2024-06-01T15:35:12Z

ext/bcmath/libbcmath/src/recmul.c

+	 * adjust the topmost entry
+	 */
+	if (UNEXPECTED(calc_size >= BC_REC_MUL_DO_ADJUST_EXPO)) {
+		prod_uint[prod_arr_real_size - 1] += prod_uint[prod_arr_real_size] * BC_MUL_UINT_OVERFLOW;


I didn't get this.

I'll add a comment

SakiTakamachi · 2024-06-02T15:35:41Z

I discovered that calculations can break under certain conditions.
This is currently only detectable in CI when testing 32bit builds.

I'm currently thinking of some test cases, and I'd like to be able to detect this even with 64-bit CI.

(I was calculating as a uint, thinking that there was no place where it would be affected, but underflow in some cases when it becomes a negative value is causing problems.)

nielsdos · 2024-06-02T16:35:14Z

You may try differential fuzzing to catch some mistakes and find potential bugs, with one fuzz variant passing inputs to the old code and the other fuzz variant passing inputs to the new code.

… account when adjusting digits.

Girgias

I am seriously struggling to review the code, all the BC_MUL_INT_DIGITS to BC_MUL_INT_DIGITS changes pollute the review, and if those are a prerequisite for this PR, then I would rather have them move to another PR, verified and validated, so that this PR can focus only on implementing the Karatsuba algorithm.

I am also struggling to see why there is so much code for what a pseudo implementation seems to do in not a lot of code, but maybe that's because the naming of the functions is not helping me and there are a bunch of other optimizations done at the same time that makes verifying the correctness of the algorithm difficult.

It might be better to go with a "dumb" implementation of karatsuba (and call the function that (e.g. bc_karatsuba rather than bc_rec_mul) and have the helper subroutines be clearly defined and named appropriately so that each part can be checked for correctness and then optimized in follow-up commits or PRs.

Aside: Can we move away from using "BCD" (BC Digit) when referring to BC numbers and just use actual words too.

Girgias · 2024-06-03T00:36:35Z

ext/bcmath/libbcmath/src/recmul.c

+
+/*
+ * In divide-and-conquer calculations, additions are concentrated on array
+ * entries around half of the ret size length.


Nit: not sure abbreviating here makes any sense.

Suggested change

* entries around half of the ret size length.

* entries around half of the return size length.

Girgias · 2024-06-03T00:37:07Z

ext/bcmath/libbcmath/src/recmul.c

+/*
+ * In divide-and-conquer calculations, additions are concentrated on array
+ * entries around half of the ret size length.
+ * e.g. ret size is 8, [7][6][5][4][3][2][1][0]


Suggested change

* e.g. ret size is 8, [7][6][5][4][3][2][1][0]

* e.g. return size is 8, [7][6][5][4][3][2][1][0]

Girgias · 2024-06-03T00:43:59Z

ext/bcmath/libbcmath/src/recmul.c

+/*
+ * If n1 and n2 are both greater than this order of magnitude, use the
+ * divide-and-conquer method (as a result of measurement, a clear speed
+ * difference appears from this order of magnitude).
+ * If the number of digits is small, the overhead impact is large and slow.
+ */


There are multiple algorithms for the multiplication of large integers, such as Toom–Cook/Toom-3 and Schönhage–Strassen algorithms. So be clear about what we are using from the get-go.

Also, some wording improvements, at least to me.

Suggested change

/*

* If n1 and n2 are both greater than this order of magnitude, use the

* divide-and-conquer method (as a result of measurement, a clear speed

* difference appears from this order of magnitude).

* If the number of digits is small, the overhead impact is large and slow.

*/

/*

* When n1 and n2 are both greater than this order of magnitude,

* use the Karatsuba divide-and-conquer algorithm.

* For smaller magnitudes the overhead of the algorithm makes it worse than the

* naive long-multiplication algorithms.

*/

Girgias · 2024-06-03T00:44:13Z

ext/bcmath/libbcmath/src/recmul.c

+ * In divide-and-conquer calculations, additions are concentrated on array
+ * entries around half of the ret size length.
+ * e.g. ret size is 8, [7][6][5][4][3][2][1][0]
+ * In this case, addition is most concentrated on [3].


Suggested change

* In this case, addition is most concentrated on [3].

* In this case, addition is most concentrated on digit number [3].

Girgias · 2024-06-03T00:47:49Z

ext/bcmath/libbcmath/src/recmul.c

+ * (If normal multiplication of N digits and N digits involves multiplying one digit N^2
+ * times, the Karatsuba-algorithm requires N^log2(3) times of calculation. N^log2(3) is
+ * approximately N^1.585.)
+ * there is a minimum unit calculation set of 9, so add (high + low - mid) 9 times.
+ *
+ * At this time, if consider one portion (high + low - mid) using ab*cd as an example,
+ * this becomes: ac + bd - (a-b)(c-d) = ac + bd - ac + ad + bc - bd = ad + bc
+ * Can see that the maximum value of this is obtained by 99*99.
+ *
+ * This law holds regardless of the calculation length, so when considering the
+ * maximum value, all mids are canceled out and can be ignored. Therefore, mid and all
+ * calculations that further divide mid can be ignored from the calculation results that
+ * are being accumulated.
+ * In other words, if the calculation length is N and the minimum calculation unit length
+ * is 2, there are N/2 high and low pairs. Therefore, the number of times the value is
+ * added is N times.


Isn't it maybe just better to reference Wikipedia or some paper for an explanation of the algorithm?
One can use a permalink such that new revision don't mess up what we are linking.
e.g. https://en.wikipedia.org/w/index.php?title=Karatsuba_algorithm&oldid=1190009898

Girgias · 2024-06-03T00:49:37Z

ext/bcmath/libbcmath/src/private.h

 #  define BC_UINT_T uint64_t
+#  define BC_INT_T int64_t


Can't we just use typedefs for these actually rather than a macro?

Girgias · 2024-06-03T00:51:01Z

ext/bcmath/libbcmath/src/recmul.c

 	memcpy(str, &digits, sizeof(digits));
 }

+static inline void bc_mul_convert_int_to_bcd(BC_INT_T *prod_int, size_t prodlen,  size_t prod_arr_size, bc_num *prod)


nit:

Suggested change

static inline void bc_mul_convert_int_to_bcd(BC_INT_T *prod_int, size_t prodlen, size_t prod_arr_size, bc_num *prod)

static inline void bc_mul_convert_int_to_bcd(BC_INT_T *prod_int, size_t prodlen, size_t prod_arr_size, bc_num *prod)

Girgias · 2024-06-03T01:26:22Z

ext/bcmath/libbcmath/src/recmul.c

+/*
+ * In divide-and-conquer calculations, determine whether the calculation length is
+ * such that digits should be adjusted to prevent overflow during calculation.
+ * Digit adjustment is performed when the calculation length is a power of
+ * BC_REC_MUL_DO_ADJUST_EXPO.
+ */
+static inline bool bc_rec_mul_near_overflow(size_t calc_arr_size)
+{
+	if (EXPECTED(calc_arr_size < BC_REC_MUL_DO_ADJUST_EXPO)) {
+		return false;
+	}
+
+	while (calc_arr_size > 0) {
+		calc_arr_size /= BC_REC_MUL_DO_ADJUST_EXPO;
+		if (UNEXPECTED(calc_arr_size == 1)) {
+			return true;
+		}
+	}
+	return false;
+}


I don't understand why calculations can overflow, according to Wikipedia there is a way to always do the required steps without multiplication overflow.

See the last paragraph of https://en.wikipedia.org/w/index.php?title=Karatsuba_algorithm&oldid=1190009898#Implementation

Girgias · 2024-06-03T01:35:00Z

The brilliant.org wiki might also be good to link for an explanation of the algorithm: https://brilliant.org/wiki/karatsuba-algorithm/

SakiTakamachi · 2024-06-03T02:13:52Z

@Girgias
As you said, I think splitting up the PR because there are extensive bug fixes that I noticed after opening the PR.
Leave this PR as it is for now and change it in order with the new PR.

SakiTakamachi · 2024-06-03T02:22:14Z

It's too complicated, so I'll try to make the commit easier to understand. Thanks both of you for checking it out.

SakiTakamachi · 2024-06-13T16:43:15Z

I've reworked the PR so I'm closing this.

#14538

SakiTakamachi requested review from Girgias and nielsdos as code owners May 30, 2024 15:26

github-actions bot added the Extension: bcmath label May 30, 2024

SakiTakamachi force-pushed the refactor_bcmath_mul_rec branch from 90c9984 to 0edb77d Compare May 30, 2024 16:17

Use divide and conquer method in bcmul

600590c

SakiTakamachi force-pushed the refactor_bcmath_mul_rec branch from c78e1b3 to 600590c Compare May 31, 2024 14:37

nielsdos requested changes Jun 1, 2024

View reviewed changes

SakiTakamachi added 4 commits June 2, 2024 22:29

use rec_mul when 32-bit

ef9706d

fix logic

adb68d0

fixed comments

3c003be

added comment

7633103

SakiTakamachi added 4 commits June 3, 2024 01:47

fixed var name

8c58a0c

use ZEND_ASSERT in bc_rec_mul_recursive_fast

5b75112

added rec mul test

1970120

All uints were changed to ints, and negative values were taken into…

d18429d

… account when adjusting digits.

Girgias requested changes Jun 3, 2024

View reviewed changes

Fixed forgotten changes

f95ae4c

SakiTakamachi mentioned this pull request Jun 3, 2024

ext/bcmath: bcmul - Changed unsigned integer type to signed integer type #14447

Closed

SakiTakamachi mentioned this pull request Jun 13, 2024

ext/bcmath: Using Karatsuba algorithm in bcmul #14538

Closed

SakiTakamachi closed this Jun 13, 2024

SakiTakamachi deleted the refactor_bcmath_mul_rec branch June 13, 2024 16:43

	* Can see that the maximum value of this is obtained when 99*99.
	* Can see that the maximum value of this is obtained by 99*99.

	* This law holds true regardless of the calculation length, so when considering the
	* This law holds regardless of the calculation length, so when considering the

	* geometric ratio is r: a(r^(N-1) - 1)(r - 1)
	* geometric ratio is r: a(r^(N-1) - 1)/(r - 1)

	* Here, a and r are both 2, so this formula becomes: 2(2^(N-1) - 1)(2 - 1) = 2^N - 2
	* Here, a and r are both 2, so this formula becomes: 2(2^(N-1) - 1)/(2 - 1) = 2^N - 2

	* a(r^(N-1) - 1)(r - 1): 4(2^(N-1) - 1)(2 - 1) = 2^N - 2 = 2(2^N - 2)
	* a(r^(N-1) - 1)/(r - 1): 4(2^(N-1) - 1)/(2 - 1) = 2^N - 2 = 2(2^N - 2)

	* entries around half of the ret size length.
	* entries around half of the return size length.

	* e.g. ret size is 8, [7][6][5][4][3][2][1][0]
	* e.g. return size is 8, [7][6][5][4][3][2][1][0]

	* In this case, addition is most concentrated on [3].
	* In this case, addition is most concentrated on digit number [3].

	static inline void bc_mul_convert_int_to_bcd(BC_INT_T prod_int, size_t prodlen, size_t prod_arr_size, bc_num prod)
	static inline void bc_mul_convert_int_to_bcd(BC_INT_T prod_int, size_t prodlen, size_t prod_arr_size, bc_num prod)

ext/bcmath: Use divide and conquer method in bcmul #14376

ext/bcmath: Use divide and conquer method in bcmul #14376

Uh oh!

Conversation

SakiTakamachi commented May 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

before

after

Uh oh!

SakiTakamachi commented May 31, 2024

Uh oh!

SakiTakamachi commented May 31, 2024

Uh oh!

nielsdos left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SakiTakamachi commented Jun 2, 2024

Uh oh!

nielsdos commented Jun 2, 2024

Uh oh!

Girgias left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Girgias commented Jun 3, 2024

Uh oh!

SakiTakamachi commented Jun 3, 2024

Uh oh!

SakiTakamachi commented Jun 3, 2024

Uh oh!

SakiTakamachi commented Jun 13, 2024

Uh oh!

ext/bcmath: Use divide and conquer method in `bcmul` #14376

ext/bcmath: Use divide and conquer method in `bcmul` #14376

SakiTakamachi commented May 30, 2024 •

edited

Loading