Skip to content

Low-level: discrepancy between field arithmetic performance and elliptic curve performance #446

@mratsim

Description

@mratsim

As mentioned in #445, there is a large discrepancy between the performance when benchmarking field arithmetic and the elliptic curves built on top, especially on Secp256k1 vs libsecp256k1 and RustCrypto.
We start with a 1.7x advantage for field that gets reduced to a 0.85x disadvantage on constant-time code.

There is an unexplained performance bug.

Some possibilities:

  • There is a parameter passing bug similar to Internal API: in-place vs result #21 and Extremely bad codegen on Fp2 #146 however looking into the assembly with Ghidra, we have 12 LEA and 13 MOV befor function calls, doesn't seem costly enough for such a difference. There is the regular if adx test but it should be cached and almost costless on Haswell and later CPU.
  • Unsaturated arithmetic allows for greater ILP (Instruction level parallelism. This seems unlikely as field arithmetic with unsaturated is 2x slower than my impl.
  • Cache effects. For example we don't hardcode the prime modulus and after a long computation it might be evicted from cache.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions