Low-level: discrepancy between field arithmetic performance and elliptic curve performance

As mentioned in #445, there is a large discrepancy between the performance when benchmarking field arithmetic and the elliptic curves built on top, especially on Secp256k1 vs libsecp256k1 and RustCrypto.
We start with a 1.7x advantage for field that gets reduced to a 0.85x disadvantage on constant-time code.

There is an unexplained performance bug.

Some possibilities:
- There is a parameter passing bug similar to #21 and #146 however looking into the assembly with Ghidra, we have 1~2 LEA and 1~3 MOV befor function calls, doesn't seem costly enough for such a difference. There is the regular `if adx` test but it should be cached and almost costless on Haswell and later CPU.
- Unsaturated arithmetic allows for greater ILP (Instruction level parallelism. This seems unlikely as field arithmetic with unsaturated is 2x slower than my impl.
- Cache effects. For example we don't hardcode the prime modulus and after a long computation it might be evicted from cache.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Low-level: discrepancy between field arithmetic performance and elliptic curve performance #446

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Low-level: discrepancy between field arithmetic performance and elliptic curve performance #446

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions