Skip to content
This repository was archived by the owner on Jul 5, 2024. It is now read-only.

Bench Placeholder #352

Closed
wants to merge 2 commits into from
Closed

Bench Placeholder #352

wants to merge 2 commits into from

Conversation

barryWhiteHat
Copy link
Contributor

No description provided.

@barryWhiteHat barryWhiteHat added the benchmarks: ALL Triggers a long running prover benchmark label Feb 25, 2022
@CPerezz
Copy link
Contributor

CPerezz commented Feb 25, 2022

Did this timeout? @AronisAt79

Is there any way to get deeper/more detailed logs than the ones that appear here reported to the GH actions?

@AronisAt79
Copy link
Contributor

AronisAt79 commented Feb 26, 2022

Hi @CPerezz , the prover node stores detailed log files under /home/ubuntu/CI_Prover_Benches/PR. If the execbench script fails, prover node remains poweredup so you can login and have a look at the log.

In this case, this was the only thing written
running 1 test
Start: Setup generation with degree = 19
test evm_circuit::evm_circ_benches::bench_evm_circuit_prover has been running for over 60 seconds
End: Setup generation with degree = 19 .........................................318.475s
Start: EVM Proof generation with 19 rows
error: test failed, to rerun pass '-p circuit-benchmarks --lib'
Caused by:
process didn't exit successfully: /home/ubuntu/CI_Prover_Benches/PR352/target/release/deps/circuit_benchmarks-896c81e1771e2252 bench_evm_circuit_prover --nocapture (signal: 9, SIGKILL: kill)

according to dmesg, there was an oom issue:

[ 3734.452360] Out of memory: Killed process 8096 (circuit_benchma) total-vm:790492436kB, anon-rss:779230428kB, file-rss:156kB, shmem-rss:0kB, UID:1000 pgtables:1527672kB oom_score_adj:0

@CPerezz
Copy link
Contributor

CPerezz commented Feb 28, 2022

@AronisAt79 looks like the machine ran out of RAM..

Is this the machine we used to benchmark everything like 1 month ago or so?

@AronisAt79
Copy link
Contributor

@CPerezz , yes this is the one, currently configured with 768G of RAM.

@CPerezz
Copy link
Contributor

CPerezz commented Feb 28, 2022

Then I guess this should be a regression. As EVM circuit wasn't exceeding the RAM limits previously..
Ping @han0110

@han0110
Copy link
Contributor

han0110 commented Feb 28, 2022

I'm also not sure, but I guess it's due to too much lookup used in CALLDATACOPY. If so, I will try to find a temporary way to fix this, and also start to work on reducing the lookup for EVM circuit.

@barryWhiteHat
Copy link
Contributor Author

@AronisAt79 is afk today but should be back tomo i think. When you get back could you bump the ram on the CI server so we don't OOM ? :)

@barryWhiteHat
Copy link
Contributor Author

We were always expecting multi slot opcodes call , call data copy and others to grow the prover a bunch. We measured for call iirc and it did jump a bit. Lets explore a bit optimize the circuit layouter and see how that helps us.

@CPerezz
Copy link
Contributor

CPerezz commented Feb 28, 2022

I'm also not sure, but I guess it's due to too much lookup used in CALLDATACOPY. If so, I will try to find a temporary way to fix this, and also start to work on reducing the lookup for EVM circuit.

Maybe we can not worry for now.
I mean, instead to just do a quick fix. Simply note down all of the introductions that caused the regression.
And once the development is done, we can then go for them.

Maybe on that way you can off-load a bit and we can also advance and focus on development.

WDYT? (Of course if it takes 2TB RAM we need to do something...) I just say it in case the increase is not massive.

Also, @AronisAt79 , maybe it would be nice to display RAM and Processor which is being used. So we have the info of how much RAM was the limit etc..

@AronisAt79
Copy link
Contributor

@CPerezz

"maybe it would be nice to display RAM and Processor which is being used. So we have the info of how much RAM was the limit etc"

this will need some rewriting, currently it is only displayed if the script terminates with no error. I will for sure make the necessary adaptation. For now, we can monitor up to a few days of system stats with netdata at http://10.10.0.183:19999 (real time as well obviously)

btw, i will now increase the system resources. Do you want me to re trigger the bench?

@AronisAt79 AronisAt79 added benchmarks: ALL Triggers a long running prover benchmark and removed benchmarks: ALL Triggers a long running prover benchmark labels Feb 28, 2022
@github-actions github-actions bot added the T-opcode Type: opcode-related and focused PR/Issue label Feb 28, 2022
@han0110 han0110 added benchmarks: ALL Triggers a long running prover benchmark and removed benchmarks: ALL Triggers a long running prover benchmark labels Feb 28, 2022
@CPerezz CPerezz added benchmarks: ALL Triggers a long running prover benchmark and removed benchmarks: ALL Triggers a long running prover benchmark labels Mar 1, 2022
CPerezz added a commit that referenced this pull request Mar 3, 2022
As @han0110 said. Before #341 &[&[]] is passed
(as single proof instance with no instnace columns values).
Now we were passing &[] only for instances.
Which basically means that num_proofs=0.

This was causing the errors during the benchmarks triggered in #352.

Co-authored-by: @han0110
@CPerezz CPerezz mentioned this pull request Mar 3, 2022
CPerezz added a commit that referenced this pull request Mar 3, 2022
As @han0110 said. Before #341 &[&[]] is passed
(as single proof instance with no instnace columns values).
Now we were passing &[] only for instances.
Which basically means that num_proofs=0.

This was causing the errors during the benchmarks triggered in #352.

Co-authored-by: @han0110
@github-actions github-actions bot removed the T-opcode Type: opcode-related and focused PR/Issue label Mar 3, 2022
@CPerezz CPerezz added benchmarks: ALL Triggers a long running prover benchmark and removed benchmarks: ALL Triggers a long running prover benchmark circuit: pref-regression labels Mar 3, 2022
@CPerezz
Copy link
Contributor

CPerezz commented Mar 4, 2022

@barryWhiteHat the results are out:

running 1 test
Start:   Setup generation with degree = 19
test evm_circuit::evm_circ_benches::bench_evm_circuit_prover has been running for over 60 seconds
End:     Setup generation with degree = 19 .........................................598.472s
Start:   EVM Proof generation with 19 rows
End:     EVM Proof generation with 19 rows .........................................7869.555s
Start:   EVM Proof verification
End:     EVM Proof verification ....................................................492.438ms
test evm_circuit::evm_circ_benches::bench_evm_circuit_prover ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out; finished in 8516.98s

Maximum CPU Usage at 100.0%
Maximum Mem Usage at 868.7Gb

@AronisAt79 .
In the logs so far we only have the results for the evm_circuit. But nothing about the state or the keccak ones. Why is that? Am I missing something?

@CPerezz
Copy link
Contributor

CPerezz commented Mar 4, 2022

Also, can we close this? Or do we want to keep this placeholder for something else?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
benchmarks: ALL Triggers a long running prover benchmark
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants