Public key aggregation is slow

On my i5-5257U (dual core mobile Broadwell 2.7Ghz Turbo 3.1 from 2015), I get the following stats for [signature aggregation](https://github.com/status-im/nim-beacon-chain/blob/386d2460d867085ad87c646f4fc44483a2b1034d/benchmarks/bench_bls_sig_agggregation.nim):

```
Warmup: 1.1908 s, result 224 (displayed to avoid compiler optimizing warmup away)

#### Block parameters
Number of validators:                                                                       482
Number of block parent hashes:                                                               12
Fork version:                                                                                 3
Slot:                                                                                      4246
Shard_id:                                                                                   555
Parent_hash[0]:                99D2587E07003CFE8023D46401577191EF89BFCC239A6EF1922AC49A687116A2
Shard_block_hash:              0CF579DC04024D8D4292A4BBCFCAD24F6A20C44AF665A7A4144CE84E8821E77A
justified_slot:                                                                            1846


#### Message, crypto keys and signatures
482 secret and public keys pairs generated in 2.014 s
Throughput: 239.279 kps/s (key pairs/second)


Message generated in 0.010 ms


482 public key and message signature pairs generated in 1.153 s
Throughput: 418.150 kps/s (keysig pairs/second)


#### Benchmark: signature aggregation

Benchmarking signature aggregation
Collected 100 samples in 153.974 seconds
Average time: 1539.735 ms
Stddev  time: 3.821 ms
Min     time: 1536.821 ms
Max     time: 1558.711 ms

Display computation result to make sure it's not optimized away
0418ff7d1d14353af2f95bb25724fa9787cd4e95c4b5040dbddf1ff3a601c29943974ad5cf806c89b04fda4564c513d2ae1420cecdeaaa0bd4888a5b066efafa2222425216e8e8a43982735c68ddf37ef0494cfc1830e8be270bd5d026804f19f8
```

But uncommenting the public key aggregation benchmark will leave the bench stuck, not even 10 samples can be benchmarked in 2 min:

![image](https://user-images.githubusercontent.com/22738317/47606922-18825680-da1a-11e8-8063-76942e52c990.png)


If we dive into the detail of ECP2_BLS381_mul, FP2_BLS381_mul is a huge bottleneck:
![2018-10-27_18-58-45](https://user-images.githubusercontent.com/22738317/47606945-67c88700-da1a-11e8-8691-74335f74426d.png)

This is due to BIG_384_29_mul and BIG_384_29_monty (Montgomery reduction?)

![image](https://user-images.githubusercontent.com/22738317/47606982-bbd36b80-da1a-11e8-9481-7ee7e14ea2d0.png)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Public key aggregation is slow #13

mratsim
openedon Oct 27, 2018

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Public key aggregation is slow #13

Description

mratsimopenedon Oct 27, 2018

Metadata