Merge bitcoin-core/secp256k1#979: Native jacobi symbol algorithm

ce3cfc7 doc: Describe Jacobi calculation in safegcd_implementation.md (Elliott Jin) 6be0103 Add secp256k1_fe_is_square_var function (Pieter Wuille) 1de2a01 Native jacobi symbol algorithm (Pieter Wuille) 04c6c1b Make secp256k1_modinv64_det_check_pow2 support abs val (Pieter Wuille) 5fffb2c Make secp256k1_i128_check_pow2 support -(2^n) (Pieter Wuille) Pull request description: This introduces variants of the vartime divsteps-based GCD algorithm used for modular inverses to compute Jacobi symbols. Changes compared to the normal vartime divsteps: * Only positive matrices are used, guaranteeing that f and g remain positive. * An additional jac variable is updated to track sign changes during matrix computation. * There is (so far) no proof that this algorithm terminates within reasonable amount of time for every input, but experimentally it appears to almost always need less than 900 iterations. To account for that, only a bounded number of iterations is performed (1500), after which failure is returned. The field logic then falls back to using square roots to determining the result. * The algorithm converges to f=g=gcd(f0,g0) rather than g=0. To keep this test simple, the end condition is f=1, which won't be reached if started with g=0. That case is dealt with specially. This code is currently unused, except for tests. I don't aim for it to be merged until there is a need for it, but this demonstrates its feasibility. In terms of performance: ``` field_inverse: min 1.76us / avg 1.76us / max 1.78us field_inverse_var: min 0.991us / avg 0.993us / max 0.996us field_jacobi_var: min 1.31us / avg 1.31us / max 1.31us field_sqrt: min 4.36us / avg 4.37us / max 4.40us ``` while with the older (f24e122) libgmp based Jacobi code on the same system: ``` num_jacobi: min 1.53us / avg 1.54us / max 1.55us ``` ACKs for top commit: jonasnick: ACK ce3cfc7 real-or-random: reACK ce3cfc7 diff and writeup is good and I tested every commit Tree-SHA512: 8a6204a7a108d8802d942a54faca39917f90ea5923130683bbd870f9025f4ec8ef256ffa1d939a793f0b32d4cdfcdcd1d3f8ae5ed74a0193be7ad98362ce027e
BlockstreamResearch · Mar 1, 2023 · 09b1d46 · 09b1d46
2 parents cbd2555 + ce3cfc7
commit 09b1d46
Show file tree

Hide file tree

Showing 13 changed files with 553 additions and 40 deletions.
diff --git a/doc/safegcd_implementation.md b/doc/safegcd_implementation.md
@@ -1,7 +1,7 @@
 # The safegcd implementation in libsecp256k1 explained
 
-This document explains the modular inverse implementation in the `src/modinv*.h` files. It is based
-on the paper
+This document explains the modular inverse and Jacobi symbol implementations in the `src/modinv*.h` files.
+It is based on the paper
 ["Fast constant-time gcd computation and modular inversion"](https://gcd.cr.yp.to/papers.html#safegcd)
 by Daniel J. Bernstein and Bo-Yin Yang. The references below are for the Date: 2019.04.13 version.
 
@@ -769,3 +769,51 @@ def modinv_var(M, Mi, x):
         d, e = update_de(d, e, t, M, Mi)
     return normalize(f, d, Mi)
 ```
+
+## 8. From GCDs to Jacobi symbol
+
+We can also use a similar approach to calculate Jacobi symbol *(x | M)* by keeping track of an
+extra variable *j*, for which at every step *(x | M) = j (g | f)*. As we update *f* and *g*, we
+make corresponding updates to *j* using
+[properties of the Jacobi symbol](https://en.wikipedia.org/wiki/Jacobi_symbol#Properties):
+* *((g/2) | f)* is either *(g | f)* or *-(g | f)*, depending on the value of *f mod 8* (negating if it's *3* or *5*).
+* *(f | g)* is either *(g | f)* or *-(g | f)*, depending on *f mod 4* and *g mod 4* (negating if both are *3*).
+
+These updates depend only on the values of *f* and *g* modulo *4* or *8*, and can thus be applied
+very quickly, as long as we keep track of a few additional bits of *f* and *g*. Overall, this
+calculation is slightly simpler than the one for the modular inverse because we no longer need to
+keep track of *d* and *e*.
+
+However, one difficulty of this approach is that the Jacobi symbol *(a | n)* is only defined for
+positive odd integers *n*, whereas in the original safegcd algorithm, *f, g* can take negative
+values. We resolve this by using the following modified steps:
+
+```python
+        # Before
+        if delta > 0 and g & 1:
+            delta, f, g = 1 - delta, g, (g - f) // 2
+
+        # After
+        if delta > 0 and g & 1:
+            delta, f, g = 1 - delta, g, (g + f) // 2
+```
+
+The algorithm is still correct, since the changed divstep, called a "posdivstep" (see section 8.4
+and E.5 in the paper) preserves *gcd(f, g)*. However, there's no proof that the modified algorithm
+will converge. The justification for posdivsteps is completely empirical: in practice, it appears
+that the vast majority of nonzero inputs converge to *f=g=gcd(f<sub>0</sub>, g<sub>0</sub>)* in a
+number of steps proportional to their logarithm.
+
+Note that:
+- We require inputs to satisfy *gcd(x, M) = 1*, as otherwise *f=1* is not reached.
+- We require inputs *x &neq; 0*, because applying posdivstep with *g=0* has no effect.
+- We need to update the termination condition from *g=0* to *f=1*.
+
+We account for the possibility of nonconvergence by only performing a bounded number of
+posdivsteps, and then falling back to square-root based Jacobi calculation if a solution has not
+yet been found.
+
+The optimizations in sections 3-7 above are described in the context of the original divsteps, but
+in the C implementation we also adapt most of them (not including "avoiding modulus operations",
+since it's not necessary to track *d, e*, and "constant-time operation", since we never calculate
+Jacobi symbols for secret data) to the posdivsteps version.
diff --git a/src/bench_internal.c b/src/bench_internal.c
@@ -219,6 +219,19 @@ static void bench_field_sqrt(void* arg, int iters) {
     CHECK(j <= iters);
 }
 
+static void bench_field_is_square_var(void* arg, int iters) {
+    int i, j = 0;
+    bench_inv *data = (bench_inv*)arg;
+    secp256k1_fe t = data->fe[0];
+
+    for (i = 0; i < iters; i++) {
+        j += secp256k1_fe_is_square_var(&t);
+        secp256k1_fe_add(&t, &data->fe[1]);
+        secp256k1_fe_normalize_var(&t);
+    }
+    CHECK(j <= iters);
+}
+
 static void bench_group_double_var(void* arg, int iters) {
     int i;
     bench_inv *data = (bench_inv*)arg;
@@ -371,6 +384,7 @@ int main(int argc, char **argv) {
     if (d || have_flag(argc, argv, "field") || have_flag(argc, argv, "mul")) run_benchmark("field_mul", bench_field_mul, bench_setup, NULL, &data, 10, iters*10);
     if (d || have_flag(argc, argv, "field") || have_flag(argc, argv, "inverse")) run_benchmark("field_inverse", bench_field_inverse, bench_setup, NULL, &data, 10, iters);
     if (d || have_flag(argc, argv, "field") || have_flag(argc, argv, "inverse")) run_benchmark("field_inverse_var", bench_field_inverse_var, bench_setup, NULL, &data, 10, iters);
+    if (d || have_flag(argc, argv, "field") || have_flag(argc, argv, "issquare")) run_benchmark("field_is_square_var", bench_field_is_square_var, bench_setup, NULL, &data, 10, iters);
     if (d || have_flag(argc, argv, "field") || have_flag(argc, argv, "sqrt")) run_benchmark("field_sqrt", bench_field_sqrt, bench_setup, NULL, &data, 10, iters);
 
     if (d || have_flag(argc, argv, "group") || have_flag(argc, argv, "double")) run_benchmark("group_double_var", bench_group_double_var, bench_setup, NULL, &data, 10, iters*10);

diff --git a/src/field.h b/src/field.h
@@ -135,4 +135,7 @@ static void secp256k1_fe_half(secp256k1_fe *r);
  *  magnitude set to 'm' and is normalized if (and only if) 'm' is zero. */
 static void secp256k1_fe_get_bounds(secp256k1_fe *r, int m);
 
+/** Determine whether a is a square (modulo p). */
+static int secp256k1_fe_is_square_var(const secp256k1_fe *a);
+
 #endif /* SECP256K1_FIELD_H */
diff --git a/src/field_10x26_impl.h b/src/field_10x26_impl.h
@@ -1365,4 +1365,31 @@ static void secp256k1_fe_inv_var(secp256k1_fe *r, const secp256k1_fe *x) {
     VERIFY_CHECK(secp256k1_fe_normalizes_to_zero(r) == secp256k1_fe_normalizes_to_zero(&tmp));
 }
 
+static int secp256k1_fe_is_square_var(const secp256k1_fe *x) {
+    secp256k1_fe tmp;
+    secp256k1_modinv32_signed30 s;
+    int jac, ret;
+
+    tmp = *x;
+    secp256k1_fe_normalize_var(&tmp);
+    /* secp256k1_jacobi32_maybe_var cannot deal with input 0. */
+    if (secp256k1_fe_is_zero(&tmp)) return 1;
+    secp256k1_fe_to_signed30(&s, &tmp);
+    jac = secp256k1_jacobi32_maybe_var(&s, &secp256k1_const_modinfo_fe);
+    if (jac == 0) {
+        /* secp256k1_jacobi32_maybe_var failed to compute the Jacobi symbol. Fall back
+         * to computing a square root. This should be extremely rare with random
+         * input (except in VERIFY mode, where a lower iteration count is used). */
+        secp256k1_fe dummy;
+        ret = secp256k1_fe_sqrt(&dummy, &tmp);
+    } else {
+#ifdef VERIFY
+        secp256k1_fe dummy;
+        VERIFY_CHECK(jac == 2*secp256k1_fe_sqrt(&dummy, &tmp) - 1);
+#endif
+        ret = jac >= 0;
+    }
+    return ret;
+}
+
 #endif /* SECP256K1_FIELD_REPR_IMPL_H */
diff --git a/src/field_5x52_impl.h b/src/field_5x52_impl.h
@@ -664,4 +664,31 @@ static void secp256k1_fe_inv_var(secp256k1_fe *r, const secp256k1_fe *x) {
 #endif
 }
 
+static int secp256k1_fe_is_square_var(const secp256k1_fe *x) {
+    secp256k1_fe tmp;
+    secp256k1_modinv64_signed62 s;
+    int jac, ret;
+
+    tmp = *x;
+    secp256k1_fe_normalize_var(&tmp);
+    /* secp256k1_jacobi64_maybe_var cannot deal with input 0. */
+    if (secp256k1_fe_is_zero(&tmp)) return 1;
+    secp256k1_fe_to_signed62(&s, &tmp);
+    jac = secp256k1_jacobi64_maybe_var(&s, &secp256k1_const_modinfo_fe);
+    if (jac == 0) {
+        /* secp256k1_jacobi64_maybe_var failed to compute the Jacobi symbol. Fall back
+         * to computing a square root. This should be extremely rare with random
+         * input (except in VERIFY mode, where a lower iteration count is used). */
+        secp256k1_fe dummy;
+        ret = secp256k1_fe_sqrt(&dummy, &tmp);
+    } else {
+#ifdef VERIFY
+        secp256k1_fe dummy;
+        VERIFY_CHECK(jac == 2*secp256k1_fe_sqrt(&dummy, &tmp) - 1);
+#endif
+        ret = jac >= 0;
+    }
+    return ret;
+}
+
 #endif /* SECP256K1_FIELD_REPR_IMPL_H */
diff --git a/src/int128.h b/src/int128.h
@@ -80,10 +80,10 @@ static SECP256K1_INLINE void secp256k1_i128_from_i64(secp256k1_int128 *r, int64_
 /* Compare two 128-bit values for equality. */
 static SECP256K1_INLINE int secp256k1_i128_eq_var(const secp256k1_int128 *a, const secp256k1_int128 *b);
 
-/* Tests if r is equal to 2^n.
+/* Tests if r is equal to sign*2^n (sign must be 1 or -1).
  * n must be strictly less than 127.
  */
-static SECP256K1_INLINE int secp256k1_i128_check_pow2(const secp256k1_int128 *r, unsigned int n);
+static SECP256K1_INLINE int secp256k1_i128_check_pow2(const secp256k1_int128 *r, unsigned int n, int sign);
 
 #endif
 

diff --git a/src/int128_native_impl.h b/src/int128_native_impl.h
@@ -84,9 +84,10 @@ static SECP256K1_INLINE int secp256k1_i128_eq_var(const secp256k1_int128 *a, con
    return *a == *b;
 }
 
-static SECP256K1_INLINE int secp256k1_i128_check_pow2(const secp256k1_int128 *r, unsigned int n) {
+static SECP256K1_INLINE int secp256k1_i128_check_pow2(const secp256k1_int128 *r, unsigned int n, int sign) {
    VERIFY_CHECK(n < 127);
-   return (*r == (int128_t)1 << n);
+   VERIFY_CHECK(sign == 1 || sign == -1);
+   return (*r == (int128_t)((uint128_t)sign << n));
 }
 
 #endif
diff --git a/src/int128_struct_impl.h b/src/int128_struct_impl.h
@@ -189,10 +189,11 @@ static SECP256K1_INLINE int secp256k1_i128_eq_var(const secp256k1_int128 *a, con
    return a->hi == b->hi && a->lo == b->lo;
 }
 
-static SECP256K1_INLINE int secp256k1_i128_check_pow2(const secp256k1_int128 *r, unsigned int n) {
-   VERIFY_CHECK(n < 127);
-   return n >= 64 ? r->hi == (uint64_t)1 << (n - 64) && r->lo == 0
-                  : r->hi == 0 && r->lo == (uint64_t)1 << n;
+static SECP256K1_INLINE int secp256k1_i128_check_pow2(const secp256k1_int128 *r, unsigned int n, int sign) {
+    VERIFY_CHECK(n < 127);
+    VERIFY_CHECK(sign == 1 || sign == -1);
+    return n >= 64 ? r->hi == (uint64_t)sign << (n - 64) && r->lo == 0
+                   : r->hi == (uint64_t)((sign - 1) >> 1) && r->lo == (uint64_t)sign << n;
 }
 
 #endif
diff --git a/src/modinv32.h b/src/modinv32.h
@@ -35,4 +35,9 @@ static void secp256k1_modinv32_var(secp256k1_modinv32_signed30 *x, const secp256
 /* Same as secp256k1_modinv32_var, but constant time in x (not in the modulus). */
 static void secp256k1_modinv32(secp256k1_modinv32_signed30 *x, const secp256k1_modinv32_modinfo *modinfo);
 
+/* Compute the Jacobi symbol for (x | modinfo->modulus). x must be coprime with modulus (and thus
+ * cannot be 0, as modulus >= 3). All limbs of x must be non-negative. Returns 0 if the result
+ * cannot be computed. */
+static int secp256k1_jacobi32_maybe_var(const secp256k1_modinv32_signed30 *x, const secp256k1_modinv32_modinfo *modinfo);
+
 #endif /* SECP256K1_MODINV32_H */