Skip to content

Conversation

@polytypic
Copy link
Collaborator

@polytypic polytypic commented Apr 6, 2023

This PR introduces a number of internal optimizations.

  1. In case only CMP operations are performed, it is not necessary to perform an additional verify step as all the locations have already been (read once and) verified by the determine phase. The determine phase now keeps track of whether there were CAS and/or CMP operations and then a verify is skipped if there weren't both.

  2. fenceless_get and fenceless_set operations are used internally in places where it is safe to do, because the fences are redundant as there are other fenceful operations accessing the same words.

  3. A level of indirection is eliminated by using an unsafe cast. This also reduces the size of a location by two words.

The last two optimizations use "unsafe" features of OCaml. I implemented them in such a way that they are easy to revert simply by adjusting comment blocks in the code.

@polytypic polytypic force-pushed the avoid-unnecessary-verifies branch 2 times, most recently from 8fa9fb1 to ba4015e Compare April 8, 2023 08:21
@polytypic polytypic force-pushed the avoid-unnecessary-verifies branch 3 times, most recently from ec267f5 to 44d6610 Compare April 22, 2023 17:01
@polytypic polytypic changed the title Avoid unnecessary verify in case of read-only operations Optimizations Apr 22, 2023
@polytypic polytypic force-pushed the avoid-unnecessary-verifies branch from 44d6610 to 3b579ed Compare April 22, 2023 17:40
@polytypic polytypic marked this pull request as ready for review April 22, 2023 17:52
@polytypic
Copy link
Collaborator Author

polytypic commented Apr 22, 2023

In the benchmarks below, kcas-main is the add-blocking branch and kcas-this is with the optimizations.

Benchmark 1: kcas-main/_build/default/test/benchmark.exe 1 10000
  Time (mean ± σ):       3.6 ms ±   0.0 ms    [User: 2.7 ms, System: 0.6 ms]
  Range (min … max):     3.5 ms …   3.9 ms    829 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: kcas-this/_build/default/test/benchmark.exe 1 10000
  Time (mean ± σ):       2.9 ms ±   0.0 ms    [User: 2.0 ms, System: 0.6 ms]
  Range (min … max):     2.8 ms …   3.3 ms    1015 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  'kcas-this/_build/default/test/benchmark.exe 1 10000' ran
    1.23 ± 0.02 times faster than 'kcas-main/_build/default/test/benchmark.exe 1 10000'


Benchmark 1: kcas-main/_build/default/test/benchmark.exe 2 10000
  Time (mean ± σ):       8.0 ms ±   0.1 ms    [User: 7.1 ms, System: 0.6 ms]
  Range (min … max):     7.9 ms …   8.4 ms    375 runs
 
Benchmark 2: kcas-this/_build/default/test/benchmark.exe 2 10000
  Time (mean ± σ):       8.0 ms ±   0.0 ms    [User: 7.1 ms, System: 0.6 ms]
  Range (min … max):     7.9 ms …   8.3 ms    374 runs
 
Summary
  'kcas-this/_build/default/test/benchmark.exe 2 10000' ran
    1.01 ± 0.01 times faster than 'kcas-main/_build/default/test/benchmark.exe 2 10000'


Benchmark 1: kcas-main/_build/default/test/benchmark.exe 4 10000
  Time (mean ± σ):      13.4 ms ±   0.1 ms    [User: 12.5 ms, System: 0.6 ms]
  Range (min … max):    13.3 ms …  13.7 ms    223 runs
 
Benchmark 2: kcas-this/_build/default/test/benchmark.exe 4 10000
  Time (mean ± σ):      13.3 ms ±   0.1 ms    [User: 12.4 ms, System: 0.6 ms]
  Range (min … max):    13.2 ms …  13.6 ms    226 runs
 
Summary
  'kcas-this/_build/default/test/benchmark.exe 4 10000' ran
    1.00 ± 0.01 times faster than 'kcas-main/_build/default/test/benchmark.exe 4 10000'


Benchmark 1: kcas-main/_build/default/test/xt_benchmark.exe 1 10000
  Time (mean ± σ):       7.3 ms ±   0.0 ms    [User: 6.4 ms, System: 0.6 ms]
  Range (min … max):     7.2 ms …   7.6 ms    408 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: kcas-this/_build/default/test/xt_benchmark.exe 1 10000
  Time (mean ± σ):       6.5 ms ±   0.1 ms    [User: 5.6 ms, System: 0.6 ms]
  Range (min … max):     6.4 ms …   7.0 ms    461 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  'kcas-this/_build/default/test/xt_benchmark.exe 1 10000' ran
    1.11 ± 0.01 times faster than 'kcas-main/_build/default/test/xt_benchmark.exe 1 10000'


Benchmark 1: kcas-main/_build/default/test/xt_benchmark.exe 2 10000
  Time (mean ± σ):      12.9 ms ±   0.1 ms    [User: 12.0 ms, System: 0.6 ms]
  Range (min … max):    12.8 ms …  13.4 ms    228 runs
 
Benchmark 2: kcas-this/_build/default/test/xt_benchmark.exe 2 10000
  Time (mean ± σ):      12.8 ms ±   0.1 ms    [User: 11.9 ms, System: 0.6 ms]
  Range (min … max):    12.6 ms …  13.2 ms    234 runs
 
Summary
  'kcas-this/_build/default/test/xt_benchmark.exe 2 10000' ran
    1.01 ± 0.01 times faster than 'kcas-main/_build/default/test/xt_benchmark.exe 2 10000'


Benchmark 1: kcas-main/_build/default/test/xt_benchmark.exe 4 10000
  Time (mean ± σ):      23.2 ms ±   0.2 ms    [User: 22.3 ms, System: 0.7 ms]
  Range (min … max):    23.0 ms …  24.1 ms    129 runs
 
Benchmark 2: kcas-this/_build/default/test/xt_benchmark.exe 4 10000
  Time (mean ± σ):      23.1 ms ±   0.2 ms    [User: 22.1 ms, System: 0.7 ms]
  Range (min … max):    22.8 ms …  23.7 ms    130 runs
 
Summary
  'kcas-this/_build/default/test/xt_benchmark.exe 4 10000' ran
    1.01 ± 0.01 times faster than 'kcas-main/_build/default/test/xt_benchmark.exe 4 10000'


Benchmark 1: kcas-main/_build/default/test/xt_parallel_cmp_bench.exe 100000
  Time (mean ± σ):      20.8 ms ±   1.3 ms    [User: 34.2 ms, System: 1.6 ms]
  Range (min … max):    19.6 ms …  29.2 ms    149 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: kcas-this/_build/default/test/xt_parallel_cmp_bench.exe 100000
  Time (mean ± σ):      19.5 ms ±   1.4 ms    [User: 31.7 ms, System: 1.6 ms]
  Range (min … max):    18.1 ms …  25.0 ms    164 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  'kcas-this/_build/default/test/xt_parallel_cmp_bench.exe 100000' ran
    1.07 ± 0.10 times faster than 'kcas-main/_build/default/test/xt_parallel_cmp_bench.exe 100000'


Benchmark 1: kcas-main/_build/default/test/xt_parallel_cmp_bench.exe 200000
  Time (mean ± σ):      39.4 ms ±   2.2 ms    [User: 67.3 ms, System: 2.2 ms]
  Range (min … max):    37.4 ms …  42.7 ms    71 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: kcas-this/_build/default/test/xt_parallel_cmp_bench.exe 200000
  Time (mean ± σ):      36.2 ms ±   2.6 ms    [User: 61.3 ms, System: 2.2 ms]
  Range (min … max):    34.0 ms …  40.7 ms    87 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  'kcas-this/_build/default/test/xt_parallel_cmp_bench.exe 200000' ran
    1.09 ± 0.10 times faster than 'kcas-main/_build/default/test/xt_parallel_cmp_bench.exe 200000'


Benchmark 1: kcas-main/_build/default/test/xt_parallel_cmp_bench.exe 400000
  Time (mean ± σ):      76.8 ms ±   4.7 ms    [User: 134.1 ms, System: 3.3 ms]
  Range (min … max):    72.5 ms …  87.4 ms    36 runs
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (82.1 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You are already using the '--warmup' option which helps to fill these caches before the actual benchmark. You can either try to increase the warmup count further or re-run this benchmark on a quiet system in case it was a random outlier. Alternatively, consider using the '--prepare' option to clear the caches before each timing run.
 
Benchmark 2: kcas-this/_build/default/test/xt_parallel_cmp_bench.exe 400000
  Time (mean ± σ):      71.3 ms ±   5.4 ms    [User: 123.4 ms, System: 3.2 ms]
  Range (min … max):    66.0 ms …  78.7 ms    44 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  'kcas-this/_build/default/test/xt_parallel_cmp_bench.exe 400000' ran
    1.08 ± 0.10 times faster than 'kcas-main/_build/default/test/xt_parallel_cmp_bench.exe 400000'


Benchmark 1: kcas-main/_build/default/test/benchmark.exe 1 200000
  Time (mean ± σ):      36.5 ms ±   0.1 ms    [User: 35.5 ms, System: 0.7 ms]
  Range (min … max):    36.2 ms …  36.9 ms    82 runs
 
Benchmark 2: kcas-this/_build/default/test/benchmark.exe 1 200000
  Time (mean ± σ):      23.2 ms ±   0.2 ms    [User: 22.2 ms, System: 0.7 ms]
  Range (min … max):    23.0 ms …  25.7 ms    129 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  'kcas-this/_build/default/test/benchmark.exe 1 200000' ran
    1.58 ± 0.02 times faster than 'kcas-main/_build/default/test/benchmark.exe 1 200000'


Benchmark 1: kcas-main/_build/default/test/benchmark.exe 2 200000
  Time (mean ± σ):     127.1 ms ±   0.2 ms    [User: 125.8 ms, System: 0.9 ms]
  Range (min … max):   126.8 ms … 127.4 ms    23 runs
 
Benchmark 2: kcas-this/_build/default/test/benchmark.exe 2 200000
  Time (mean ± σ):     125.7 ms ±   0.2 ms    [User: 124.5 ms, System: 0.9 ms]
  Range (min … max):   125.3 ms … 126.1 ms    23 runs
 
Summary
  'kcas-this/_build/default/test/benchmark.exe 2 200000' ran
    1.01 ± 0.00 times faster than 'kcas-main/_build/default/test/benchmark.exe 2 200000'


Benchmark 1: kcas-main/_build/default/test/benchmark.exe 4 200000
  Time (mean ± σ):     235.3 ms ±   0.3 ms    [User: 233.7 ms, System: 1.2 ms]
  Range (min … max):   234.9 ms … 235.8 ms    12 runs
 
Benchmark 2: kcas-this/_build/default/test/benchmark.exe 4 200000
  Time (mean ± σ):     234.3 ms ±   0.3 ms    [User: 232.7 ms, System: 1.2 ms]
  Range (min … max):   234.0 ms … 234.8 ms    12 runs
 
Summary
  'kcas-this/_build/default/test/benchmark.exe 4 200000' ran
    1.00 ± 0.00 times faster than 'kcas-main/_build/default/test/benchmark.exe 4 200000'


Benchmark 1: kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 1 1_000_000 1000 90
  Time (mean ± σ):      44.1 ms ±   0.2 ms    [User: 42.9 ms, System: 0.8 ms]
  Range (min … max):    43.8 ms …  45.0 ms    67 runs
 
Benchmark 2: kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 1 1_000_000 1000 90
  Time (mean ± σ):      42.4 ms ±   0.1 ms    [User: 41.2 ms, System: 0.8 ms]
  Range (min … max):    42.2 ms …  42.8 ms    70 runs
 
Summary
  'kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 1 1_000_000 1000 90' ran
    1.04 ± 0.01 times faster than 'kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 1 1_000_000 1000 90'


Benchmark 1: kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 2 1_000_000 1000 90
  Time (mean ± σ):      28.0 ms ±   0.1 ms    [User: 51.8 ms, System: 1.2 ms]
  Range (min … max):    27.7 ms …  28.5 ms    105 runs
 
Benchmark 2: kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 2 1_000_000 1000 90
  Time (mean ± σ):      24.8 ms ±   0.2 ms    [User: 45.3 ms, System: 1.2 ms]
  Range (min … max):    24.5 ms …  26.1 ms    119 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  'kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 2 1_000_000 1000 90' ran
    1.13 ± 0.01 times faster than 'kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 2 1_000_000 1000 90'


Benchmark 1: kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 4 1_000_000 1000 90
  Time (mean ± σ):      17.2 ms ±   0.3 ms    [User: 56.9 ms, System: 2.0 ms]
  Range (min … max):    16.7 ms …  20.6 ms    145 runs
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (20.6 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You are already using the '--warmup' option which helps to fill these caches before the actual benchmark. You can either try to increase the warmup count further or re-run this benchmark on a quiet system in case it was a random outlier. Alternatively, consider using the '--prepare' option to clear the caches before each timing run.
 
Benchmark 2: kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 4 1_000_000 1000 90
  Time (mean ± σ):      15.8 ms ±   0.3 ms    [User: 51.2 ms, System: 1.9 ms]
  Range (min … max):    15.3 ms …  18.7 ms    190 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  'kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 4 1_000_000 1000 90' ran
    1.09 ± 0.03 times faster than 'kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 4 1_000_000 1000 90'


Benchmark 1: kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 1 1_000_000 1000 10
  Time (mean ± σ):     261.0 ms ±   0.2 ms    [User: 259.5 ms, System: 1.1 ms]
  Range (min … max):   260.5 ms … 261.3 ms    11 runs
 
Benchmark 2: kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 1 1_000_000 1000 10
  Time (mean ± σ):     256.3 ms ±   2.0 ms    [User: 254.8 ms, System: 1.1 ms]
  Range (min … max):   254.9 ms … 260.2 ms    11 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  'kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 1 1_000_000 1000 10' ran
    1.02 ± 0.01 times faster than 'kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 1 1_000_000 1000 10'


Benchmark 1: kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 2 1_000_000 1000 10
  Time (mean ± σ):     160.4 ms ±   1.2 ms    [User: 315.2 ms, System: 1.6 ms]
  Range (min … max):   159.8 ms … 165.2 ms    18 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 2 1_000_000 1000 10
  Time (mean ± σ):     132.2 ms ±   0.1 ms    [User: 259.1 ms, System: 1.5 ms]
  Range (min … max):   131.9 ms … 132.4 ms    22 runs
 
Summary
  'kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 2 1_000_000 1000 10' ran
    1.21 ± 0.01 times faster than 'kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 2 1_000_000 1000 10'


Benchmark 1: kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 4 1_000_000 1000 10
  Time (mean ± σ):      87.9 ms ±   0.3 ms    [User: 336.8 ms, System: 2.5 ms]
  Range (min … max):    87.5 ms …  88.8 ms    34 runs
 
Benchmark 2: kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 4 1_000_000 1000 10
  Time (mean ± σ):      75.5 ms ±   0.4 ms    [User: 287.2 ms, System: 2.5 ms]
  Range (min … max):    74.8 ms …  76.5 ms    40 runs
 
Summary
  'kcas-this/_build/default/test/kcas_data/hashtbl_bench.exe 4 1_000_000 1000 10' ran
    1.16 ± 0.01 times faster than 'kcas-main/_build/default/test/kcas_data/hashtbl_bench.exe 4 1_000_000 1000 10'

@polytypic polytypic requested a review from a team April 22, 2023 18:07
@polytypic polytypic mentioned this pull request Apr 24, 2023
@polytypic polytypic force-pushed the avoid-unnecessary-verifies branch 2 times, most recently from b15a7d1 to 633bd32 Compare April 27, 2023 08:00
@polytypic polytypic mentioned this pull request Apr 27, 2023
@polytypic polytypic force-pushed the avoid-unnecessary-verifies branch 2 times, most recently from 8554b4e to d7a18fc Compare April 28, 2023 06:44
In case only CMP operations are performed, it is not necessary to perform an
additional verify step as all the locations have already been (read once and)
verified by the determine phase.
@polytypic polytypic force-pushed the avoid-unnecessary-verifies branch from d7a18fc to c695e83 Compare April 28, 2023 06:45
@polytypic polytypic force-pushed the avoid-unnecessary-verifies branch from c695e83 to 92daa44 Compare April 28, 2023 06:57
@polytypic polytypic force-pushed the avoid-unnecessary-verifies branch from 92daa44 to 5a9792e Compare April 28, 2023 07:00
@polytypic polytypic merged commit 370ff1a into main Apr 28, 2023
@polytypic polytypic deleted the avoid-unnecessary-verifies branch April 28, 2023 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants