-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Groth16 Circom is slower than Rapidsnark when circuit constraint is large #460
Comments
One big difference is that the Tachyon prover parses ZKey, whereas Rapidsnark avoids parsing ZKey and just takes pointers. When the ZKey gets larger, the Tachyon prover takes more overhead because of this. |
We also tried to avoid parsing ZKey as Rapidsnark does in this branch, but encountered some issues and got stuck. Basically what we should do is that first, the Groth16 proving key or verifying key should be modified to allow members of pointers instead of |
To be fair, the code to benchmark using num_runs should include only this portion. |
Thanks! Rapidsnark has a server mode. It will read the ZKey and load it to memory. To avoid parsing ZKey, you may consider adding a server mode. I think it is not hard to add a server mode. |
Yes, it's possible, but it isn't prioritized at this moment. By the way, could you tell me what purpose you use Circom for? |
Updated the proving process to repeat only the prove part, excluding the zkey parsing. This change is made because the zkey parsing takes significant time, but typically this is not the part users want to measure. Additionally, as rapidsnark does not include zkey parsing, this adjustment ensures fairness in benchmarking. Related: #460
Server proving and mobile proving, since Tachyon is better than rapidsnark Mopro plans to integrate Tachyon Circom for mobile proving: zkmopro/mopro#143 They benchmark and find Tachyon is the fastest: https://docs.google.com/spreadsheets/d/1irKg_TOP-yXms8igwCN_3OjVrtFe5gTHkuF0RbrVuho/edit?gid=289866675#gid=289866675 |
Updated the proving process to repeat only the prove part, excluding the zkey parsing. This change is made because the zkey parsing takes significant time, but typically this is not the part users want to measure. Additionally, as rapidsnark does not include zkey parsing, this adjustment ensures fairness in benchmarking. Related: #460
@doutv Can you try benchmarking again using this branch? ========== Rapidsnark CPU ==========
~/rapidsnark/build_prover/src/prover complex-circuit-1200k-1200k.zkey build/complex-circuit-1200k-1200k.wtns proof.json public.json
prove : 5.00701seconds
entire : 5.01327seconds
mem 1107036
time 5.91
cpu 2951%
========== Tachyon CPU ==========
Start parsing zkey
Time taken for parsing zkey: 0.224168 s
Start parsing witness
Time taken for parsing witness: 0.016893 s
Start proving
Time taken for proving #0: 3.39144 s
Time taken for proving #1: 2.90737 s
Time taken for proving #2: 2.95133 s
Time taken for proving #3: 3.0924 s |
@batzor Nice job! For a fair comparison, I disable
|
Btw, in larger circuit 3200k constraints, the time is very close ./2-benchmark.sh complex-circuit complex-circuit-3200k-3200k
---------- complex-circuit-3200k-3200k ----------
Sample Size: 10
========== Rapidsnark CPU ==========
mem 4481 MB
time 5.823000 s
cpu 4428
========== Tachyon CPU ==========
mem 2928 MB
time 5.814000 s
cpu 4560 proving #0 is slower, since it hasn't load zkey into memory? prover_main complex-circuit-3200k-3200k.zkey build/complex-circuit-3200k-3200k.wtns proof.json public.json --num_runs 10
========== Tachyon CPU ==========
Start parsing zkey
Time taken for parsing zkey: 1.00079 s
Start parsing witness
Time taken for parsing witness: 0.067337 s
Start proving
Time taken for proving #0: 4.53864 s
Time taken for proving #1: 3.94007 s
Time taken for proving #2: 3.93159 s
Time taken for proving #3: 3.96555 s
Time taken for proving #4: 3.93451 s
Time taken for proving #5: 3.934 s
Time taken for proving #6: 3.92835 s
Time taken for proving #7: 3.93559 s
Time taken for proving #8: 3.94272 s
Time taken for proving #9: 3.93657 s
Avg time taken for proving: 3.99876 s
Max time taken for proving: 4.53864 s |
This should be optimized with faster vector initialization feature, since when Tachyon creates But i am not sure about why the proving #0 is only slow. |
Benchmark result after #490
Wow! You guys are making rapid progress! |
Issue type
Performance
OS platform and distribution
Ubuntu 22
Current behavior?
Machine: AMD Ryzen Threadripper PRO 5975WX 32-Cores
I benchmark tachyon circom vendor and rapidsnark, and compare their performance.
When circuit constraint < 400k, tachyon is faster and requires less memory
Especially when circuit size is small, tachyon is much better.
However, when circuit constraint > 400k, rapidsnark is faster and requires less memory
Expected Behavior?
I want to figure out the reason.
Standalone code or description to reproduce the issue
Repo: https://github.com/doutv/circom-benchmark.git
Additional context
No response
The text was updated successfully, but these errors were encountered: