Skip to content

Conversation

@Forostovec
Copy link

  • Read a/b/div via Uint256::from_var_name(...).pack() in uint256_mul_div_mod
  • Use Uint256 in uint256_signed_nn for consistent access to a.high
  • Add Uint256::from_base_addr_with_offsets and use it in uint256_offseted_unsigned_div_rem (standard layout uses from_var_name, expanded layout uses offsets)
  • Form quotient_low/quotient_high via Uint512::split into two Uint256, and write remainder via Uint256

These changes remove duplicated packing logic, standardize error paths, and align uint256 hint implementations with the existing UintNNN helper patterns

Copy link
Contributor

@JulianGCalderon JulianGCalderon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Forostovec, thanks for the contribution! Could you also update the changelog?

Comment on lines 332 to 337
let a = Uint256::from_var_name("a", vm, ids_data, ap_tracking)?;
let a_high = a.high;
//Main logic
//memory[ap] = 1 if 0 <= (ids.a.high % PRIME) < 2 ** 127 else 0
let result: Felt252 =
if *a_high >= Felt252::ZERO && a_high.as_ref() <= &Felt252::from(i128::MAX) {
if *a_high.as_ref() >= Felt252::ZERO && a_high.as_ref() <= &Felt252::from(i128::MAX) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would imply fetching a.low, but not using it, right?

Comment on lines 398 to 406
let div = if div_offset_low == 0 && div_offset_high == 1 {
// Standard Uint256 layout
Uint256::from_var_name("div", vm, ids_data, ap_tracking)?
} else {
let div_addr = get_relocatable_from_var_name("div", vm, ids_data, ap_tracking)?;
Uint256::from_base_addr_with_offsets(div_addr, "div", vm, div_offset_low, div_offset_high)?
};
let div_low = div.low.as_ref();
let div_high = div.high.as_ref();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here could we call from_base_addr_with_offsets directly and avoid the conditional? It would make the function easier to read.

Also, in this function we are packing the U256 into a bigint manually. Could you modify it to call pack? Given that this PR is for unifying U256 usage in general, we can include it.

@gabrielbosio
Copy link
Collaborator

Hi, @Forostovec, what's the status of this PR?

@Forostovec
Copy link
Author

Hi, @Forostovec, what's the status of this PR?

Ohh, I'm sorry, gonna make chamges and update changelog

@gabrielbosio
Copy link
Collaborator

**Hyper Thereading Benchmark results**




hyperfine -r 2 -n "hyper_threading_main threads: 1" 'RAYON_NUM_THREADS=1 ./hyper_threading_main' -n "hyper_threading_pr threads: 1" 'RAYON_NUM_THREADS=1 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 1
  Time (mean ± σ):     22.606 s ±  0.006 s    [User: 21.754 s, System: 0.849 s]
  Range (min … max):   22.602 s … 22.610 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 1
  Time (mean ± σ):     22.661 s ±  0.049 s    [User: 21.821 s, System: 0.837 s]
  Range (min … max):   22.627 s … 22.695 s    2 runs
 
Summary
  hyper_threading_main threads: 1 ran
    1.00 ± 0.00 times faster than hyper_threading_pr threads: 1




hyperfine -r 2 -n "hyper_threading_main threads: 2" 'RAYON_NUM_THREADS=2 ./hyper_threading_main' -n "hyper_threading_pr threads: 2" 'RAYON_NUM_THREADS=2 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 2
  Time (mean ± σ):     12.191 s ±  0.034 s    [User: 21.941 s, System: 0.842 s]
  Range (min … max):   12.167 s … 12.215 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 2
  Time (mean ± σ):     12.121 s ±  0.009 s    [User: 21.813 s, System: 0.870 s]
  Range (min … max):   12.115 s … 12.128 s    2 runs
 
Summary
  hyper_threading_pr threads: 2 ran
    1.01 ± 0.00 times faster than hyper_threading_main threads: 2




hyperfine -r 2 -n "hyper_threading_main threads: 4" 'RAYON_NUM_THREADS=4 ./hyper_threading_main' -n "hyper_threading_pr threads: 4" 'RAYON_NUM_THREADS=4 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 4
  Time (mean ± σ):      9.467 s ±  0.168 s    [User: 34.527 s, System: 1.078 s]
  Range (min … max):    9.348 s …  9.586 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 4
  Time (mean ± σ):      9.651 s ±  0.349 s    [User: 34.346 s, System: 1.034 s]
  Range (min … max):    9.403 s …  9.898 s    2 runs
 
Summary
  hyper_threading_main threads: 4 ran
    1.02 ± 0.04 times faster than hyper_threading_pr threads: 4




hyperfine -r 2 -n "hyper_threading_main threads: 6" 'RAYON_NUM_THREADS=6 ./hyper_threading_main' -n "hyper_threading_pr threads: 6" 'RAYON_NUM_THREADS=6 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 6
  Time (mean ± σ):      9.586 s ±  0.088 s    [User: 34.396 s, System: 1.042 s]
  Range (min … max):    9.524 s …  9.649 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 6
  Time (mean ± σ):      9.542 s ±  0.172 s    [User: 34.313 s, System: 1.024 s]
  Range (min … max):    9.420 s …  9.664 s    2 runs
 
Summary
  hyper_threading_pr threads: 6 ran
    1.00 ± 0.02 times faster than hyper_threading_main threads: 6




hyperfine -r 2 -n "hyper_threading_main threads: 8" 'RAYON_NUM_THREADS=8 ./hyper_threading_main' -n "hyper_threading_pr threads: 8" 'RAYON_NUM_THREADS=8 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 8
  Time (mean ± σ):      9.440 s ±  0.185 s    [User: 35.055 s, System: 1.108 s]
  Range (min … max):    9.310 s …  9.571 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 8
  Time (mean ± σ):      9.368 s ±  0.058 s    [User: 34.916 s, System: 1.068 s]
  Range (min … max):    9.327 s …  9.409 s    2 runs
 
Summary
  hyper_threading_pr threads: 8 ran
    1.01 ± 0.02 times faster than hyper_threading_main threads: 8




hyperfine -r 2 -n "hyper_threading_main threads: 16" 'RAYON_NUM_THREADS=16 ./hyper_threading_main' -n "hyper_threading_pr threads: 16" 'RAYON_NUM_THREADS=16 ./hyper_threading_pr'
Benchmark 1: hyper_threading_main threads: 16
  Time (mean ± σ):      9.521 s ±  0.163 s    [User: 35.080 s, System: 1.159 s]
  Range (min … max):    9.405 s …  9.636 s    2 runs
 
Benchmark 2: hyper_threading_pr threads: 16
  Time (mean ± σ):      9.651 s ±  0.099 s    [User: 34.908 s, System: 1.160 s]
  Range (min … max):    9.580 s …  9.721 s    2 runs
 
Summary
  hyper_threading_main threads: 16 ran
    1.01 ± 0.02 times faster than hyper_threading_pr threads: 16


@gabrielbosio
Copy link
Collaborator

Benchmark Results for unmodified programs 🚀

Command Mean [s] Min [s] Max [s] Relative
base big_factorial 1.975 ± 0.029 1.930 2.018 1.00 ± 0.02
head big_factorial 1.966 ± 0.012 1.948 1.988 1.00
Command Mean [s] Min [s] Max [s] Relative
base big_fibonacci 1.885 ± 0.009 1.867 1.901 1.00
head big_fibonacci 1.905 ± 0.026 1.881 1.968 1.01 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base blake2s_integration_benchmark 6.626 ± 0.107 6.430 6.763 1.00
head blake2s_integration_benchmark 6.689 ± 0.069 6.607 6.820 1.01 ± 0.02
Command Mean [s] Min [s] Max [s] Relative
base compare_arrays_200000 2.022 ± 0.017 1.990 2.040 1.00
head compare_arrays_200000 2.024 ± 0.014 2.009 2.044 1.00 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base dict_integration_benchmark 1.350 ± 0.006 1.341 1.360 1.01 ± 0.01
head dict_integration_benchmark 1.337 ± 0.006 1.332 1.350 1.00
Command Mean [s] Min [s] Max [s] Relative
base field_arithmetic_get_square_benchmark 1.128 ± 0.010 1.117 1.144 1.00
head field_arithmetic_get_square_benchmark 1.130 ± 0.009 1.119 1.146 1.00 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base integration_builtins 6.745 ± 0.072 6.677 6.891 1.00
head integration_builtins 6.824 ± 0.069 6.724 6.963 1.01 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base keccak_integration_benchmark 6.771 ± 0.078 6.664 6.923 1.00
head keccak_integration_benchmark 6.819 ± 0.054 6.732 6.940 1.01 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base linear_search 2.010 ± 0.020 1.990 2.049 1.00
head linear_search 2.021 ± 0.016 2.001 2.044 1.01 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base math_cmp_and_pow_integration_benchmark 1.436 ± 0.007 1.426 1.447 1.00 ± 0.01
head math_cmp_and_pow_integration_benchmark 1.434 ± 0.008 1.425 1.455 1.00
Command Mean [s] Min [s] Max [s] Relative
base math_integration_benchmark 1.393 ± 0.009 1.376 1.405 1.00 ± 0.01
head math_integration_benchmark 1.392 ± 0.007 1.381 1.403 1.00
Command Mean [s] Min [s] Max [s] Relative
base memory_integration_benchmark 1.133 ± 0.005 1.124 1.139 1.00
head memory_integration_benchmark 1.142 ± 0.009 1.131 1.160 1.01 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base operations_with_data_structures_benchmarks 1.468 ± 0.009 1.454 1.479 1.00 ± 0.01
head operations_with_data_structures_benchmarks 1.464 ± 0.008 1.452 1.477 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
base pedersen 508.9 ± 2.6 504.9 512.1 1.00 ± 0.01
head pedersen 508.0 ± 1.6 506.3 510.8 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
base poseidon_integration_benchmark 592.8 ± 2.4 588.3 595.8 1.00
head poseidon_integration_benchmark 596.6 ± 1.9 593.0 599.3 1.01 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base secp_integration_benchmark 1.722 ± 0.012 1.706 1.743 1.01 ± 0.01
head secp_integration_benchmark 1.706 ± 0.008 1.699 1.726 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
base set_integration_benchmark 646.6 ± 2.4 643.4 649.6 1.00
head set_integration_benchmark 650.6 ± 3.0 644.7 655.2 1.01 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base uint256_integration_benchmark 3.869 ± 0.046 3.797 3.955 1.00 ± 0.01
head uint256_integration_benchmark 3.854 ± 0.022 3.823 3.907 1.00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants