Skip to content

Conversation

FrancoGiachetta
Copy link
Contributor

@FrancoGiachetta FrancoGiachetta commented Sep 15, 2025

Optimize circuit Compilation.

This PR changes how we handle arithmetic operations with circuits. Currently, for every gate, we extend all the operands to avoid overflows (to either u385 o u768), we generate the corresponding arith operations and then trunc the result back to u384.
In x86, arithmetic operations need to go through a legalization pass. This is because x86 allows up to 2 registers per instruction, while addition, subtraction and multiplication operations need 3 registers. A circuit with enough gates can make compilation quite longer due to this.
To avoid having all these operations inlined, this PR creates function which takes care of performing the arithmetic and return the result ready to be used. This allow to significantly reduce the compilation time since the go from thousands of arith operations to less than 10, and very little work for the legalization pass to perform.
However, this comes with the downside of adding an extra indirection. Now every circuit operation reduces to almost one function call, with the exception of the inversion operation which was left as it is now since it already had an extra indirection. This generates a regression in execution time.

Benchmarks

In both cases, compilation and execution, classes were compiled using --opt-level 2. Some classes may have no execution benchmarks, this is because there were no transactions invoking them.

class compilation BASE (s) compilation HEAD (s) improvement execution BASE (ms) execution HEAD (ms) improvement
0x05ff378cb2f16804539ecb92e84f273aafbab57d450530e9fe8e87771705a673 340.94 67.23 5.07 296.76 291.56 1.01
0x4ffeed293927cd56686a9038a10026a2d3b9602f789d1f163c1c4ac9a822a82 448.50 67.19 6.67 43.37 46.63 0.93
0x4edde37ca59d9dff8f4ac8945b1c4860b606abd61d74727904ad7494fccdfa9 328.67 67.31 4.88 - - -
0x5862777a13917417ef3bb87ebc28b071753c316b1a3fb31f9e937d73e0aa188 480.27 88.51 5.42 - - -

Introduces Breaking Changes?

No.

These PRs should be merged after this one right away, in that order.

Checklist

  • Linked to Github Issue.
  • Unit tests added.
  • Integration tests added.
  • This change requires new documentation.
    • Documentation has been added/updated.

Copy link

github-actions bot commented Sep 15, 2025

✅ Code is now correctly formatted.

Copy link

github-actions bot commented Sep 15, 2025

Benchmark results Main vs HEAD.

Base

Command Mean [s] Min [s] Max [s] Relative
base dict_insert.cairo (JIT) 2.685 ± 0.021 2.653 2.726 1.03 ± 0.01
base dict_insert.cairo (AOT) 2.609 ± 0.022 2.576 2.635 1.00

Head

Command Mean [s] Min [s] Max [s] Relative
head dict_insert.cairo (JIT) 2.535 ± 0.019 2.495 2.563 1.03 ± 0.01
head dict_insert.cairo (AOT) 2.468 ± 0.029 2.423 2.513 1.00

Base

Command Mean [s] Min [s] Max [s] Relative
base dict_snapshot.cairo (JIT) 2.395 ± 0.012 2.380 2.422 1.04 ± 0.01
base dict_snapshot.cairo (AOT) 2.304 ± 0.016 2.281 2.335 1.00

Head

Command Mean [s] Min [s] Max [s] Relative
head dict_snapshot.cairo (JIT) 2.251 ± 0.028 2.211 2.291 1.04 ± 0.02
head dict_snapshot.cairo (AOT) 2.163 ± 0.019 2.140 2.186 1.00

Base

Command Mean [s] Min [s] Max [s] Relative
base factorial_2M.cairo (JIT) 2.699 ± 0.016 2.671 2.718 1.02 ± 0.01
base factorial_2M.cairo (AOT) 2.654 ± 0.018 2.623 2.671 1.00

Head

Command Mean [s] Min [s] Max [s] Relative
head factorial_2M.cairo (JIT) 2.659 ± 0.030 2.612 2.709 1.03 ± 0.01
head factorial_2M.cairo (AOT) 2.584 ± 0.015 2.544 2.600 1.00

Base

Command Mean [s] Min [s] Max [s] Relative
base fib_2M.cairo (JIT) 2.323 ± 0.039 2.290 2.429 1.02 ± 0.02
base fib_2M.cairo (AOT) 2.267 ± 0.021 2.236 2.306 1.00

Head

Command Mean [s] Min [s] Max [s] Relative
head fib_2M.cairo (JIT) 2.161 ± 0.033 2.113 2.209 1.02 ± 0.03
head fib_2M.cairo (AOT) 2.126 ± 0.050 2.071 2.255 1.00

Base

Command Mean [s] Min [s] Max [s] Relative
base linear_search.cairo (JIT) 2.482 ± 0.014 2.466 2.506 1.06 ± 0.01
base linear_search.cairo (AOT) 2.338 ± 0.015 2.315 2.367 1.00

Head

Command Mean [s] Min [s] Max [s] Relative
head linear_search.cairo (JIT) 2.334 ± 0.018 2.302 2.365 1.04 ± 0.01
head linear_search.cairo (AOT) 2.242 ± 0.020 2.208 2.265 1.00

Base

Command Mean [s] Min [s] Max [s] Relative
base logistic_map.cairo (JIT) 2.627 ± 0.020 2.594 2.654 1.07 ± 0.01
base logistic_map.cairo (AOT) 2.448 ± 0.017 2.423 2.474 1.00

Head

Command Mean [s] Min [s] Max [s] Relative
head logistic_map.cairo (JIT) 2.484 ± 0.016 2.466 2.515 1.07 ± 0.01
head logistic_map.cairo (AOT) 2.318 ± 0.024 2.282 2.360 1.00

@codecov-commenter
Copy link

codecov-commenter commented Sep 15, 2025

Codecov Report

❌ Patch coverage is 97.97297% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.46%. Comparing base (f27f7fd) to head (88448c9).

Files with missing lines Patch % Lines
src/metadata/runtime_bindings.rs 97.52% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1400      +/-   ##
==========================================
+ Coverage   81.38%   81.46%   +0.08%     
==========================================
  Files         105      105              
  Lines       25805    25885      +80     
==========================================
+ Hits        21001    21087      +86     
+ Misses       4804     4798       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

github-actions bot commented Sep 15, 2025

Benchmarking results

Benchmark for program dict_insert

Open benchmarks
Command Mean [s] Min [s] Max [s] Relative
Cairo-vm (Rust, Cairo 1) 10.865 ± 0.051 10.780 10.944 4.37 ± 0.04
cairo-native (embedded AOT) 2.486 ± 0.016 2.458 2.505 1.00
cairo-native (embedded JIT using LLVM's ORC Engine) 2.605 ± 0.012 2.588 2.622 1.05 ± 0.01

Benchmark for program dict_snapshot

Open benchmarks
Command Mean [ms] Min [ms] Max [ms] Relative
Cairo-vm (Rust, Cairo 1) 531.8 ± 8.6 525.2 553.4 1.00
cairo-native (embedded AOT) 2206.4 ± 38.2 2154.6 2294.3 4.15 ± 0.10
cairo-native (embedded JIT using LLVM's ORC Engine) 2314.2 ± 26.2 2278.0 2355.0 4.35 ± 0.09

Benchmark for program factorial_2M

Open benchmarks
Command Mean [s] Min [s] Max [s] Relative
Cairo-vm (Rust, Cairo 1) 4.773 ± 0.016 4.750 4.797 1.78 ± 0.02
cairo-native (embedded AOT) 2.763 ± 0.074 2.712 2.959 1.03 ± 0.03
cairo-native (embedded JIT using LLVM's ORC Engine) 2.680 ± 0.031 2.632 2.732 1.00

Benchmark for program fib_2M

Open benchmarks
Command Mean [s] Min [s] Max [s] Relative
Cairo-vm (Rust, Cairo 1) 4.670 ± 0.011 4.648 4.686 2.22 ± 0.03
cairo-native (embedded AOT) 2.103 ± 0.027 2.067 2.142 1.00
cairo-native (embedded JIT using LLVM's ORC Engine) 2.151 ± 0.012 2.129 2.165 1.02 ± 0.01

Benchmark for program linear_search

Open benchmarks
Command Mean [ms] Min [ms] Max [ms] Relative
Cairo-vm (Rust, Cairo 1) 563.1 ± 5.2 555.9 575.2 1.00
cairo-native (embedded AOT) 2181.3 ± 10.6 2161.8 2202.4 3.87 ± 0.04
cairo-native (embedded JIT using LLVM's ORC Engine) 2339.6 ± 24.1 2303.8 2382.4 4.15 ± 0.06

Benchmark for program logistic_map

Open benchmarks
Command Mean [ms] Min [ms] Max [ms] Relative
Cairo-vm (Rust, Cairo 1) 388.4 ± 10.6 378.8 416.6 1.00
cairo-native (embedded AOT) 2352.3 ± 52.7 2289.9 2468.8 6.06 ± 0.21
cairo-native (embedded JIT using LLVM's ORC Engine) 2474.0 ± 38.1 2435.3 2559.5 6.37 ± 0.20

Copy link
Contributor

@JulianGCalderon JulianGCalderon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like solving the OP type on runtime. It makes the code more difficult to read/maintain. Can we have 3 different functions, each solving a different operation? I think that unless the performance difference is significative, having 3 different functions is much better.

Also, could you add some execution and compilation benchmark data to the PR description?

@FrancoGiachetta
Copy link
Contributor Author

I'll add some benchmarks. About the change you suggested, separating the switch into 3 functions makes the compilation a ~15% slower, which I think is a little high. That's why I reverted the change. I guess, the other approach enables more optimizations. I agree it would be better for maintenance. If we are willing to sacrifice that 15% in favor of more readability, I can change it back.

@FrancoGiachetta FrancoGiachetta changed the title Optimize circuit operations Optimize circuit Compilation Oct 1, 2025
gabrielbosio
gabrielbosio previously approved these changes Oct 8, 2025
Copy link
Contributor

@JulianGCalderon JulianGCalderon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Should we execute some blocks to ensure correctness?

@FrancoGiachetta
Copy link
Contributor Author

FrancoGiachetta commented Oct 8, 2025

I think so. I did it before with a range of 20000 blocks and there was no issue. But since these last commits changed the code a little, I think we should. There were no substantial changes, but just to be sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants