Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a static constant table for small ecmult WINDOW_G sizes. #614

Closed
wants to merge 1 commit into from
Closed

Use a static constant table for small ecmult WINDOW_G sizes. #614

wants to merge 1 commit into from

Conversation

gmaxwell
Copy link
Contributor

For discussion for now: this need to be reworked some to play nicely with #600.

When WINDOW_G is small there is no reason to not pre-compute it: on almost all embedded devices flash is much 'cheaper' than ram, pre-computation can be burdensomely slow, and the code to do the precomputation might actually be larger than the table (I haven't checked). We wouldn't want to put a 1.3 MB table in the source/library but a 1024 byte one hardly seems like a concern.

This is relevant to #603 @laanwj

@gmaxwell
Copy link
Contributor Author

I only tested this very minimally, so please don't immediately dump it on an embedded device and waste 4 hours troubleshooting. Test on a desktop first. :P

src/ecmult_impl.h Outdated Show resolved Hide resolved
src/ecmult_impl.h Outdated Show resolved Hide resolved
@laanwj
Copy link
Member

laanwj commented May 10, 2019

Concept ACK

Tested that configure with out of range ecmult window value: 2, 25 fails, and the below values pass.

Benchmarks (SiFive Unleashed RV64GC, no endomorphism, default 1Ghz clock speed, gcc 9.0.1, --with-bignum=gmp default):

--with-ecmult-window=3    ecdsa_verify: min 1649us / avg 1649us / max 1649us
--with-ecmult-window=4    ecdsa_verify: min 1570us / avg 1570us / max 1570us
--with-ecmult-window=5    ecdsa_verify: min 1517us / avg 1517us / max 1517us
--with-ecmult-window=6    ecdsa_verify: min 1479us / avg 1479us / max 1479us
--with-ecmult-window=7    ecdsa_verify: min 1449us / avg 1449us / max 1449us
--with-ecmult-window=8    ecdsa_verify: min 1427us / avg 1427us / max 1427us
--with-ecmult-window=9    ecdsa_verify: min 1410us / avg 1410us / max 1410us
--with-ecmult-window=10   ecdsa_verify: min 1395us / avg 1395us / max 1395us
--with-ecmult-window=11   ecdsa_verify: min 1382us / avg 1382us / max 1382us
--with-ecmult-window=12   ecdsa_verify: min 1373us / avg 1373us / max 1373us
--with-ecmult-window=13   ecdsa_verify: min 1364us / avg 1364us / max 1364us
--with-ecmult-window=14   ecdsa_verify: min 1357us / avg 1357us / max 1357us
--with-ecmult-window=15   ecdsa_verify: min 1351us / avg 1351us / max 1351us

So from 3 to 15 is at most a 18% speed-up, where going to 6 already accounts for about half of that. Even with the lowest setting I'd say it's fast enough for boot-loader use of verifying one or a few signatures.

Another run with --with-bignum=no, which is more likely in embedded context to not pull in libgmp as a depenency:

--with-ecmult-window=3   ecdsa_verify: min 1780us / avg 1780us / max 1780us
--with-ecmult-window=4   ecdsa_verify: min 1699us / avg 1699us / max 1699us
--with-ecmult-window=5   ecdsa_verify: min 1645us / avg 1645us / max 1645us
--with-ecmult-window=6   ecdsa_verify: min 1609us / avg 1609us / max 1609us
--with-ecmult-window=7   ecdsa_verify: min 1585us / avg 1585us / max 1585us
--with-ecmult-window=8   ecdsa_verify: min 1564us / avg 1564us / max 1564us
--with-ecmult-window=9   ecdsa_verify: min 1545us / avg 1545us / max 1545us
--with-ecmult-window=10  ecdsa_verify: min 1531us / avg 1531us / max 1531us
--with-ecmult-window=11  ecdsa_verify: min 1519us / avg 1519us / max 1519us
--with-ecmult-window=12  ecdsa_verify: min 1510us / avg 1510us / max 1510us
--with-ecmult-window=13  ecdsa_verify: min 1500us / avg 1500us / max 1500us
--with-ecmult-window=14  ecdsa_verify: min 1494us / avg 1494us / max 1494us
--with-ecmult-window=15  ecdsa_verify: min 1487us / avg 1487us / max 1487us

the code to do the precomputation might actually be larger than the table (I haven't checked)

This is just about true: the size of secp256k1_context_create (verify-only) goes from 496 to 124 with this patch, so this makes place for the 256 byte table for window size 4 (note: haven't checked if any other functions go unused). It won't be true for higher window sizes 😄

@gmaxwell
Copy link
Contributor Author

@apoelstra Thoughts on how this should interact with the single allocation change?

@real-or-random
Copy link
Contributor

Concept ACK

@apoelstra Thoughts on how this should interact with the single allocation change?

How is this related to scratch spaces?

@gmaxwell
Copy link
Contributor Author

@real-or-random oops should have really directed at #566 just had 'single allocation' on the brain. When we do single allocation / no-allocation API's we'll need to leave the memory for this table out from the allocation. (which is also why I want to do this PR after merging those, as I don't want to disrupt them.)

@real-or-random
Copy link
Contributor

To adopt this to #566, you just need to do the accounting correctly here:

static const size_t SECP256K1_ECMULT_CONTEXT_PREALLOCATED_SIZE =

@gmaxwell
Copy link
Contributor Author

gmaxwell commented May 25, 2019

Rebased, fixed endomorphism support, ifdefed out secp256k1_ecmult_odd_multiples_table_storage_var.

TODO: Update configure output, include a comment explaining how the constants are generated, and decide on how big to support. 6 is sounding pretty good based on llanwj's data. Plus some assorted ifdef cleanup.

@laanwj Any interest in getting the actual binary size without this PR, with it and size 2 and size 6? on RV?

@real-or-random
Copy link
Contributor

include a comment explaining how the constants are generated

It does not hurt but this style of just hardcoding the constants is somewhat inconsistent with the one by gen_context. I like the approach here more because it's much simpler; gen_context creates a lot of complexity in the build system. Maybe we should also add a pre-generated file with gen_context constants to the repo in the future.

@gmaxwell
Copy link
Contributor Author

TODO: Update configure output, include a comment explaining how the constants are generated, and decide on how big to support. Better handling for exhaustive tests?

I'm unsure about gen_context the big reason that we don't ship a static table is because it's enormous (as in comparable to the size of the codebase enormous). This issue doesn't really arise with window_g. We should probably change signing to use #546 and then we could have equivalent performance with a much smaller table, and then it might be less of a consideration.

@real-or-random
Copy link
Contributor

I'm unsure about gen_context the big reason that we don't ship a static table is because it's enormous (as in comparable to the size of the codebase enormous).

Yep okay, I didn't see that point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants