Skip to content

Comments

FpDbl revisited#144

Merged
mratsim merged 18 commits intomasterfrom
fpdbl-revisited
Feb 1, 2021
Merged

FpDbl revisited#144
mratsim merged 18 commits intomasterfrom
fpdbl-revisited

Conversation

@mratsim
Copy link
Owner

@mratsim mratsim commented Jan 31, 2021

For fast Fp2, Fp4, Fp12, implementations we should take advantage of lazy reductions, see #15 (comment)

However this was put on hold due to an unexplained 50 cycles difference between the theory and practice as mentioned here:

# Single-width [3 Mul, 2 Add, 3 Sub]
# 3*81 + 2*14 + 3*12 = 307 theoretical cycles
# 330 measured
# Double-Width
# 316 theoretical cycles
# 365 measured
# Reductions can be 2x10 faster using MCL algorithm
# but there are still unexplained 50 cycles diff between theo and measured
# and unexplained 30 cycles between Clang and GCC
# - Function calls?
# - push/pop stack?

Since we do 2 reductions, and my CPU is now running at 3.9GHz compared 4.1, we have now found out the source of the differences between theoretical cycle count and practice.

The origin is due to nim-lang/Nim#16887 which made reduction 20 cycles slower than necessary and reduction is used twice in Fp2 multiplication.

This PR:

  • Fixes Montgomery reduction performance issue
  • Implement a slower Comba Montgomery reduction (scalar code only)
  • Implement specialized squaring, scalar and Assembly. No MULX/ADCX/ADOX code as it requires a different algorithm.
    Assembly squaring is as fast as ADX multiplication so we can expected ADX squaring to have an extra conservative 15% performance boost (up to 40% as you almost halves the number of operations).
  • Accelerate Fp2 Mul by 10% by fixing and all G1 operation by about 7% by removing copies that lead to bad codegen in FpAdd FpSub:
    when UseASM_X86_64 and a.mres.limbs.len <= 6: # TODO: handle spilling
    r = a
    addmod_asm(r.mres.limbs, b.mres.limbs, FF.fieldMod().limbs)

@mratsim mratsim merged commit 83dcd98 into master Feb 1, 2021
@mratsim mratsim deleted the fpdbl-revisited branch February 1, 2021 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant