-
-
Notifications
You must be signed in to change notification settings - Fork 62
Open
Labels
Description
As mentioned in #445, there is a large discrepancy between the performance when benchmarking field arithmetic and the elliptic curves built on top, especially on Secp256k1 vs libsecp256k1 and RustCrypto.
We start with a 1.7x advantage for field that gets reduced to a 0.85x disadvantage on constant-time code.
There is an unexplained performance bug.
Some possibilities:
- There is a parameter passing bug similar to Internal API: in-place vs result #21 and Extremely bad codegen on Fp2 #146 however looking into the assembly with Ghidra, we have 1
2 LEA and 13 MOV befor function calls, doesn't seem costly enough for such a difference. There is the regularif adxtest but it should be cached and almost costless on Haswell and later CPU. - Unsaturated arithmetic allows for greater ILP (Instruction level parallelism. This seems unlikely as field arithmetic with unsaturated is 2x slower than my impl.
- Cache effects. For example we don't hardcode the prime modulus and after a long computation it might be evicted from cache.
Reactions are currently unavailable