Skip to content

Endomorphism G2#79

Merged
mratsim merged 6 commits intomasterfrom
endomorphism-g2
Sep 3, 2020
Merged

Endomorphism G2#79
mratsim merged 6 commits intomasterfrom
endomorphism-g2

Conversation

@mratsim
Copy link
Owner

@mratsim mratsim commented Sep 3, 2020

This PR:

  • properly clears the cofactor in the BN254_Snarks test generator and Frobenius testgen
  • Implements GLS endomorphism for a ~2.2x compared to windowed scalar multiplication with window 4 (i.e. 16 precomputed elements).
  • Adds a window 5 to the comparison benchmark

Implementation is fully constant-time.

Speed on BLS12-381: 4761.021 G2mul/s, 210µs/ops, 630k cycles

For reference, status-im/nim-blscurve#47 and status-im/nim-blst#1

  • MCL JIT is at 383k cycles
  • MCL LLVM is at 522k cycles
  • Milagro/Miracl are at 3368k cycles
  • BLST is at 796k cycles

The difference with MCL JIT is explained by the very large speed difference for the addition (5.420 vs 4.638) and doubling formula (3095 vs 2.704). Those are slow for several reasons:

  1. MCL doubling is not constant-time, it shortcuts on infinity or doubling/opposite. The cost of complete formula is about 40%.
  2. Even with double-width optimization from Double-width tower extension part 1 #72, the Fp2 operations were slowed down by the C compiler probably extra copies in prologue/epilogue or register spilling. We might need __attribute__((naked)) pure assembly function to deal with that (and so need a alternative assembler/code generator). The cost seems to be 10%.
  3. Projective coordinates might not be ideal compared to Jacobian
  4. The complete formula on G2 have extra cost as they use the either b*SNR or b/SNR (b the curve added term and SNR the twist sextic non-residue). While on G1 b is trivial, on G2 the twist makes the formula expensive:
    when F is Fp2 and F.C.getSexticTwist() == D_Twist:
    t3 *= SexticNonResidue
    t4.sum(P.y, P.z) # 9. t4 <- Y1 + Z1
    r.x.sum(Q.y, Q.z) # 10. X3 <- Y2 + Z2
    t4 *= r.x # 11. t4 <- t4 X3
    r.x.sum(t1, t2) # 12. X3 <- t1 + t2 X3 = Y1 Y2 + Z1 Z2
    t4 -= r.x # 13. t4 <- t4 - X3 t4 = (Y1 + Z1)(Y2 + Z2) - (Y1 Y2 + Z1 Z2) = Y1 Z2 + Y2 Z1
    when F is Fp2 and F.C.getSexticTwist() == D_Twist:
    t4 *= SexticNonResidue
    r.x.sum(P.x, P.z) # 14. X3 <- X1 + Z1
    r.y.sum(Q.x, Q.z) # 15. Y3 <- X2 + Z2
    r.x *= r.y # 16. X3 <- X3 Y3 X3 = (X1 Z1)(X2 Z2)
    r.y.sum(t0, t2) # 17. Y3 <- t0 + t2 Y3 = X1 X2 + Z1 Z2
    r.y.diffAlias(r.x, r.y) # 18. Y3 <- X3 - Y3 Y3 = (X1 + Z1)(X2 + Z2) - (X1 X2 + Z1 Z2) = X1 Z2 + X2 Z1
    when F is Fp2 and F.C.getSexticTwist() == D_Twist:
    t0 *= SexticNonResidue
    t1 *= SexticNonResidue
    r.x.double(t0) # 19. X3 <- t0 + t0 X3 = 2 X1 X2
    t0 += r.x # 20. t0 <- X3 + t0 t0 = 3 X1 X2
    t2 *= b3 # 21. t2 <- b3 t2 t2 = 3b Z1 Z2
    when F is Fp2 and F.C.getSexticTwist() == M_Twist:
    t2 *= SexticNonResidue
    r.z.sum(t1, t2) # 22. Z3 <- t1 + t2 Z3 = Y1 Y2 + 3b Z1 Z2
    t1 -= t2 # 23. t1 <- t1 - t2 t1 = Y1 Y2 - 3b Z1 Z2
    r.y *= b3 # 24. Y3 <- b3 Y3 Y3 = 3b(X1 Z2 + X2 Z1)
    when F is Fp2 and F.C.getSexticTwist() == M_Twist:
    r.y *= SexticNonResidue

and nitpicks (don't explain even 0.5% probably):

  1. MCL recoding is using division which is not constant-time: https://github.com/herumi/mcl/blob/d79c5acb489ac54a7bd2544f8210c732c0caaa12/include/mcl/bn.hpp#L761
  2. MCL table building is not constant-time: https://github.com/herumi/mcl/blob/ef4a9de2571469861dc18ff73613a79b655de1d2/include/mcl/ec.hpp#L94-L118

TODO:

  • If cheap inversion is possible, we can use use Montgomery's simulataneous inversion and mixedAddition and mixedDoubling formulas.
  • Implement (constant-time) Jacobian coordinates.

@mratsim mratsim merged commit 85d3653 into master Sep 3, 2020
@mratsim mratsim deleted the endomorphism-g2 branch September 4, 2020 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant