Re-enable secp256k1-zkp and make it default #1534

prusnak · 2021-03-18T13:34:50Z

secp256k1(-zkp) performs better for signature verification and contains schnorr algo

prusnak · 2021-03-18T16:34:21Z

I did some experiments in the secp256k1-zkp branch

Currently, trezor.crypto.curves.secp256k1_zkp is not a drop-in replacement for trezor.crypto.curves.secp256k1, because we need to use the context manager.

So the following

secp256k1.foo()

turns into this:

with secp256k1_zkp.Context() as secp256k1:
    secp256k1.foo()

However, we might want to make this a drop-in replacement (so it looks and behaves like a module with static methods) and hide the context creation behind the scenes. There are (at least) two ways how to do this:

allocate the context before uPy interpreter runs and keep it resident (outside of uPy heap)
allocate the context within uPy on first use and keep it resident (inside of uPy heap)
other ... ?

onvej-sl · 2021-04-12T10:21:47Z

TLDR

Signing, private key to public key derivation and ecdh computation are resistant to simple power analysis (and timing attacks). If we use context randomization (secp256k1_context_randomize), signing and private key to public key derivation should be also resistant to differential power analysis. Nevertheless, the randomization makes the operations twice as long. It seems to me that ecdh computation is prone to differential power analysis. All these operation should be resistant to both simple power analysis and differential power analysis in trezor-crypto
See the tables below for the choice of constants ECMULT_WINDOW_SIZE and ECMULT_GEN_PREC_BITS.

Suggestion

It doesn't make sense to recreate the context before every signing and verification, since
- creating the context takes longer than signing or verification and
- the context is modified only by secp256k1_context_randomize (or secp256k1_ecmult_gen_blind) which serves as protection against differential power analysis and the only part that is modified is initial and blind in secp256k1_ecmult_gen_context.
Instead of that I suggest having a global context that is created once. If we want to have a differential power analysis protection, the context should be randomized with secp256k1_context_randomize before every call of secp256k1_ecdsa_sig_sign and secp256k1_ec_pubkey_create. If we want these operations to be thread-safe we may create a "shallow copy" of the context with memcpy(ctx_clone, ctx, sizeof(secp256k1_context)) before the randomization. It should be OK since initial and blind are stored in the context as values, whereas all the big tables are stored as pointers.
We can also make use of secp256k1_ec_pubkey_create, which is faster than ecdsa_get_public_key{33,65} (and scalar_multiply). If we want to use it even in bip32.c, the context cannot be stored on the micropython heap.

Report

Constants

`ECMULT_WINDOW_SIZE`

The constant is an integer from 2 to 24, inclusive. The bigger the constant is

the faster is secp256k1_ecmult (and secp256k1_ecdsa_sig_verify)
the bigger is the size of secp256k1_ecmult_context (and secp256k1_context)
the slower is secp256k1_ecmult_context_build (and secp256k1_context_preallocated_create)

`ECMULT_WINDOW_SIZE`	`secp256k1_ecdsa_sig_verify`	`secp256k1_ecmult_context_build`	`secp256k1_ecmult_context`
2	16.1 ms	5.9 ms	128 B
4	14.3 ms	6.4 ms	512 B
6	13.5 ms	8.3 ms	2 kB
8	12.9 ms	16.2 ms	8 kB
10	12.6 ms	47 ms	32 kB

`ECMULT_GEN_PREC_BITS`

The constant is 2, 4 or 8. The bigger the constant is

the faster is secp256k1_ecmult_gen (and secp256k1_ecdsa_sig_sign, secp256k1_ec_pubkey_create)
the bigger is the size of secp256k1_ecmult_gen_context (and secp256k1_context)
the slower is secp256k1_ecmult_gen_context_build (and secp256k1_context_preallocated_create), provided the context is not precomputed in compile-time using the flag USE_ECMULT_STATIC_PRECOMPUTATION.

`ECMULT_GEN_PREC_BITS`	`secp256k1_ecdsa_sig_sign`	`secp256k1_ec_pubkey_create`	`secp256k1_ecmult_gen_context`
2	23.4 ms	14.4 ms	32 kB
4	11.7 ms	7.2 ms	64 kB
8	5.9 ms	3.6 ms	512 kB

Only the times for ECMULT_GEN_PREC_BITS = 4 in the table were measured, other times are estimated.

`WINDOW_A`

The constant has a fixed value of 5 which is optimal in terms of speed of functions secp256k1_ecmult (and secp256k1_ecdsa_sig_verify) and secp256k1_ecmult_const (and secp256k1_ecdh). The bigger the constant is the more memory on stack these functions needs.

`WINDOW_A`		`secp256k1_ecdsa_sig_verify`		`secp256k1_ecdh`
2	1440 B	14.7 ms	168 B	23.7 ms
3	1772 B	13.6 ms	336 B	15.1 ms
4	2436 B	13.1 ms	672 B	12.7 ms
5	3764 B	12.9 ms	1344 B	11.8 ms
6	6420 B	13.2 ms	2688 B	12.2 ms

The times were measured for ECMULT_GEN_PREC_BITS = 4 and ECMULT_WINDOW_SIZE = 8, bytes are a lower estimate.

Contexts

`secp256k1_context`

The context consists of secp256k1_ecmult_context and secp256k1_ecmult_gen.

`secp256k1_ecmult_gen_context`

The context is used by secp256k1_ecmult_gen (and secp256k1_ecdsa_sig_sign, secp256k1_ec_pubkey_create).
The context consists of:

A read-only two-dimensional table.
- It consists of 2^ECMULT_GEN_PREC_BITS rows and 256 / ECMULT_GEN_PREC_BITS columns.
- Its total size is sizeof(secp256k1_ge_storage) * 2^ECMULT_GEN_PREC_BITS * 256 / (ECMULT_GEN_PREC_BITS) = 16384 * 2^ECMULT_GEN_PREC_BITS / ECMULT_GEN_PREC_BITS.
- If the flag USE_ECMULT_STATIC_PRECOMPUTATION, the table is precomputed in compile-tile. This works only for ECMULT_GEN_PREC_BITS = 8.
- It consists of points with unknown discrete logarithms. This should prevent some attack I don't quite understand.
- It is a kind of duplication of the table in secp256k1_table.h.
A point initial and a scalar blind such that initial = blind * G.
- The values are used in secp256k1_ecmult_gen (and secp256k1_ecdsa_sig_sign, secp256k1_ec_pubkey_create) as a protection against differential power analysis.
- The values should be refreshed by secp256k1_context_randomize (or secp256k1_ecmult_gen_blind) before every call of secp256k1_ecmult_gen (that is before every call of secp256k1_ecdsa_sig_sign and secp256k1_ec_pubkey_create), which makes the operation twice as long.
- This kind of blinding is not implemented in trezor-crypto (which uses randomized coordinates instead of that).

`secp256k1_ecmult_context`

The context is used in secp256k1_ecdsa_sig_verify.
It consists of two read-only tables:

Each table consists of 2^(ECMULT_WINDOW_SIZE - 2) elements.
The total size of the tables is sizeof(secp256k1_ge_storage) * 2 * 2^(ECMULT_WINDOW_SIZE - 2) = 32 * 2^ECMULT_WINDOW_SIZE.
The tables cannot be precomputed in compile-time. On the other hand, there is no reason to rebuild every time secp256k1_ecdsa_sig_verify is called.

Functions

`secp256k1_ecmult_gen`

Computes lambda b : b * G, where G is the generator and b is a scalar.

It is used by secp256k1_ecdsa_sig_sign and secp256k1_ec_pubkey_create.
It is constant time.
It computes b * G as (b - initial) * G + blind, where initial and blind are generated so that blind = initial * G. This is a protection against differential power analysis.
It uses ECMULT_WINDOW_SIZE-ary representation of b, that is b is represented by digits b_i such that
- b = sum(b_i * 2^(i * ECMULT_WINDOW_SIZE))
- 0 <= b_i < 2^ECMULT_WINDOW_SIZE
It uses a precomputed table that is contained in secp256k1_ecmult_gen_context. The table consists of points with unknown discrete logarithms. This should prevent a kind of attack I don't quite understand.
A similar method is used by scalar_multiply in trezor-crypto, which differs in these points:
- It uses the constant ECMULT_WINDOW_SIZE = 4.
- b is represented by digits b_i such that
  - -2^(WINDOW_A - 1) < b_i < 2^(WINDOW_A - 1)
  - b_i % 2 == 1
- It uses a table that is half in size but it consists of points with known discrete logarithm.

`secp256k1_ecmult`

Computes lambda a, b, P : a * P + b * G, where G is the generator, a and b are scalars and P is a point.

It is used by secp256k1_ecdsa_sig_verify.
It uses the fast endomorphism (see bellow).
It uses Shamir's trick.
It uses sliding-window method.
It uses ECMULT_WINDOW_SIZE-ary non-adjacent form of a, that is a is represented by digits a_i such that
- a = sum(a_i * 2^i)
- -2^(WINDOW_A - 1) < a_i < 2^(WINDOW_A - 1)
- a_i % 2 = 1 or a_i = 0
It uses WINDOW_A-ary non-adjacent form of b
It uses a precomputed table that is contained in secp256k1_ecmult_context.
It uses around 2^(WINDOW_A - 2) * (2 * sizeof(secp256k1_ge) + sizeof(secp256k1_gej) + 40) + 260 * sizeof(int) + 68 = 83 * 2^WINDOW_A + 1108 bytes on the stack.
It is non constant time, which should be all right since the method is used only in verification which usually does not work with secret information.

`secp256k1_ecmult_const`

Computes lambda a, P : a * P, where a is a scalar and P is a point.

It is used by secp256k1_ecdh.
It uses the fast endomorphism.
It is constant time.
It uses window method.
It uses a kind of WINDOW_A - 1-ary form of a, to be precise, a is represented by digits a_i such that
- a = sum(a_i * 2^((WINDOW_A - 1)* i))
- -2^(WINDOW_A - 1) < a_i < 2^(WINDOW_A - 1)
- a_i % 2 == 1
It uses around 2^(WINDOW_A - 2) * sizeof(secp256k1_ge) * 2 = 42 * 2^WINDOW_A bytes on the stack.
The same method with the same constant (WINDOW_A - 1 = 4) is used by point_multiply in trezor-crypto.
It seems to me that it doesn't use any kind of randomization as a protection against differential power analysis.

Fast endomorphism

All the curves secp{192,224,256}k1 were generated so that there is an effectively computable endomorphism that can be utilized to speed up point multiplication. The method appeared in https://www.iacr.org/archive/crypto2001/21390189.pdf for the first time.

The endomorphism phi for secp256k1 is defined as

phi : (x, y) -> (beta * x mod p, y)

where

beta = 0x7ae96a2b657c07106e64479eac3434e99cf0497512f58995c1396c28719501ee

and p is the prime the coordinates are reduced with. If (x, y) lies on the curve, the point phi((x, y)) lies on the curve as well and this surprising formula holds true:

phi(P) = lambda * P

where

lambda = 0x5363ad4cc05c30e0a5261c028812645a122e22ea20816678df02967c1b23bd72

Suppose we want to compute k * P, the algorithm that makes use of the endomorphism is the following:

Find k1 and k2 such that k = (k1 + k2 * lambda) mod n, where n is the order of the curve, and k1 and k2 are about half in length compared to k.
Compute k1 * P + k2 * phi(P) using Shamir's trick. If double-and-add method is used, the speed-up is about 40 %.

Other facts

trezor-crypto uses a blinding in ecdsa_sign_digest when computing private_nonce^-1 * (public_nonce * private_key + digest) as a protection against side-channel attacks (see https://github.com/trezor/trezor-firmware/blob/master/crypto/ecdsa.c#L706-L715), whereas secp256k1-zkp does not.
trezor-crypto uses a randomization when performing points arithmetic in jacobian coordinates as a protection against differential power analysis (see https://github.com/trezor/trezor-firmware/blob/master/crypto/ecdsa.c#L186-L197), whereas secp256k1-zkp does not.
trezor-crypto rejects signatures of zero digest as a protection against a fault injection attack since its easy to forge a signature of zero digest (see https://github.com/trezor/trezor-firmware/blob/master/crypto/ecdsa.c#L1044-L1051), whereas secp256k1-zkp does not.
It seems to me that secp256k1_ecmult_const (and secp256k1_ecdh) in secp256k1-zkp doesn't use any kind of randomization as a protection against differential power analysis.

prusnak · 2021-04-12T10:52:23Z

@onvej-sl WOW, great work Ondrej! Thank you for the very thorough analysis!

prusnak · 2021-06-12T16:34:26Z

Just rebased secp256k1-zkp on top of the current master

onvej-sl · 2021-11-19T14:01:49Z

Implemented in #1897 and #1678.

invd · 2021-11-21T14:04:53Z

I've started integrating the new mainlined zkp functionality into the T1 fuzzer and noticed some API issues. I'm not sure if this closed ticket is the right place to bring it up, so feel free to move it or respond to it elsewhere. (#1864 feels like a different scope, though.)

zkp_context.h doesn't expose bool zkp_context_is_initialized(void), so other code can't check the initialization status of the internal static secp256k1_context *context; variable
On regular builds without -DNDEBUG, the assert(context == NULL); in int zkp_context_init() prevents re-initializing non-null contexts
This presents practical issues since void zkp_context_destroy() only clears the context, but does not set context = NULL; again

The most straightforward approach is to expose the initialization check and actually null the context pointer on destroy, but I haven't looked into this very deeply and may be missing some logic or side effects.

onvej-sl · 2021-11-26T18:38:59Z

I'm not sure if this closed ticket is the right place to bring it up, so feel free to move it or respond to it elsewhere

Next time please open a new issue. It is easier to track it.

invd · 2021-11-27T11:40:25Z

Next time please open a new issue.

Will do, thanks!

prusnak added feature labels Mar 18, 2021

prusnak added this to the 21.05 milestone Mar 18, 2021

tsusanka added the P1 High label Mar 18, 2021

alex-jerechinsky modified the milestones: 21.05, 21.06 May 11, 2021

tsusanka modified the milestones: 21.06, 21.07 May 20, 2021

tsusanka assigned onvej-sl May 20, 2021

tsusanka mentioned this issue Jun 8, 2021

Support Taproot #1656

Closed

4 tasks

onvej-sl mentioned this issue Jun 22, 2021

ECDSA and Schnorr from secp256k1_zkp #1678

Merged

optout21 mentioned this issue Jul 22, 2021

Support Taproot trustwallet/wallet-core#1542

Closed

4 tasks

tsusanka removed this from the 21.07 milestone Oct 7, 2021

tsusanka removed enhancement labels Oct 7, 2021

onvej-sl closed this as completed Nov 19, 2021

onvej-sl mentioned this issue Nov 26, 2021

Onvej sl/zkp context fix #1949

Merged

andrewkozlik mentioned this issue Jul 12, 2022

Investigate how to save flash space #1947

Closed

onvej-sl mentioned this issue Jul 24, 2024

Reduce size of precomputed tables for ECC #4058

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-enable secp256k1-zkp and make it default #1534

Re-enable secp256k1-zkp and make it default #1534

prusnak commented Mar 18, 2021

prusnak commented Mar 18, 2021 •

edited

Loading

onvej-sl commented Apr 12, 2021

prusnak commented Apr 12, 2021

prusnak commented Jun 12, 2021

onvej-sl commented Nov 19, 2021

invd commented Nov 21, 2021 •

edited

Loading

onvej-sl commented Nov 26, 2021

invd commented Nov 27, 2021

Re-enable secp256k1-zkp and make it default #1534

Re-enable secp256k1-zkp and make it default #1534

Comments

prusnak commented Mar 18, 2021

prusnak commented Mar 18, 2021 • edited Loading

onvej-sl commented Apr 12, 2021

TLDR

Suggestion

Report

Constants

ECMULT_WINDOW_SIZE

ECMULT_GEN_PREC_BITS

WINDOW_A

Contexts

secp256k1_context

secp256k1_ecmult_gen_context

secp256k1_ecmult_context

Functions

secp256k1_ecmult_gen

secp256k1_ecmult

secp256k1_ecmult_const

Fast endomorphism

Other facts

prusnak commented Apr 12, 2021

prusnak commented Jun 12, 2021

onvej-sl commented Nov 19, 2021

invd commented Nov 21, 2021 • edited Loading

onvej-sl commented Nov 26, 2021

invd commented Nov 27, 2021

prusnak commented Mar 18, 2021 •

edited

Loading

`ECMULT_WINDOW_SIZE`

`ECMULT_GEN_PREC_BITS`

`WINDOW_A`

`secp256k1_context`

`secp256k1_ecmult_gen_context`

`secp256k1_ecmult_context`

`secp256k1_ecmult_gen`

`secp256k1_ecmult`

`secp256k1_ecmult_const`

invd commented Nov 21, 2021 •

edited

Loading