Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmarks: add BatchSpanProcessor benchmark #791

Merged
merged 1 commit into from
Oct 5, 2024

Conversation

iRevive
Copy link
Contributor

@iRevive iRevive commented Sep 27, 2024

First step towards #786. The idea of the benchmark is partially based on the BatchSpanProcessorBenchmark.java with some adjustments.

Comparisson: OtelJava vs SDK

Throughput

delayMs represents how long it takes for the exporter to export spans. Besides LoggingExporter, the export of spans will take at least 5+ ms.

So, SDK outperforms OtelJava in this case:

throughput                           oteljava                     sdk                        diff   
doExport delayMs=0 spanCount=1000    8819.323 ± 3730.817 ops/s    2233.270 ± 63.993 ops/s    6586.05
doExport delayMs=0 spanCount=2000    3797.170 ± 1309.373 ops/s    1188.510 ± 55.421 ops/s    2608.66
doExport delayMs=0 spanCount=5000    1136.669 ± 299.909 ops/s     458.029 ± 48.821 ops/s     678.64 
doExport delayMs=1 spanCount=1000    623.318 ± 36.452 ops/s       1428.995 ± 45.032 ops/s    -805.68 
doExport delayMs=1 spanCount=2000    323.403 ± 15.881 ops/s       742.236 ± 40.051 ops/s     -418.83 
doExport delayMs=1 spanCount=5000    96.303 ± 12.706 ops/s        313.245 ± 11.265 ops/s     -216.94 
doExport delayMs=5 spanCount=1000    150.901 ± 3.176 ops/s        371.031 ± 6.427 ops/s      -220.13 
doExport delayMs=5 spanCount=2000    87.160 ± 4.792 ops/s         212.888 ± 3.589 ops/s      -125.73 
doExport delayMs=5 spanCount=5000    21.992 ± 1.154 ops/s         92.194 ± 0.913 ops/s       -70.20  

Memory allocation

OtelJava is the clear winner here.

gc.alloc.rate                        oteljava                   sdk                          diff   
doExport delayMs=0 spanCount=1000    318.201 ± 31.342 MB/sec    4530.668 ± 144.344 MB/sec    -4212.47
doExport delayMs=0 spanCount=2000    297.138 ± 26.793 MB/sec    4378.296 ± 178.813 MB/sec    -4081.16
doExport delayMs=0 spanCount=5000    288.212 ± 25.037 MB/sec    3736.262 ± 405.653 MB/sec    -3448.05
doExport delayMs=1 spanCount=1000    17.453 ± 1.849 MB/sec      2559.874 ± 115.647 MB/sec    -2542.42
doExport delayMs=1 spanCount=2000    17.230 ± 1.628 MB/sec      2763.970 ± 109.915 MB/sec    -2746.74
doExport delayMs=1 spanCount=5000    17.907 ± 1.674 MB/sec      2836.377 ± 114.917 MB/sec    -2818.47
doExport delayMs=5 spanCount=1000    4.133 ± 0.326 MB/sec       947.881 ± 22.438 MB/sec      -943.75 
doExport delayMs=5 spanCount=2000    4.472 ± 0.331 MB/sec       1103.280 ± 38.004 MB/sec     -1098.81
doExport delayMs=5 spanCount=5000    4.191 ± 0.302 MB/sec       1435.295 ± 38.201 MB/sec     -1431.10
gc.alloc.rate.norm                   oteljava                       sdk                               diff       
doExport delayMs=0 spanCount=1000    52915.177 ± 8608.267 B/op      2140142.041 ± 77257.926 B/op      -2087226.86 
doExport delayMs=0 spanCount=2000    104583.377 ± 15086.649 B/op    3879006.608 ± 113885.063 B/op     -3774423.23 
doExport delayMs=0 spanCount=5000    305166.733 ± 25885.235 B/op    8602132.998 ± 318167.400 B/op     -8296966.27 
doExport delayMs=1 spanCount=1000    30927.168 ± 775.856 B/op       1883736.183 ± 40232.099 B/op      -1852809.02 
doExport delayMs=1 spanCount=2000    59071.491 ± 606.298 B/op       3975272.476 ± 314899.888 B/op     -3916200.99 
doExport delayMs=1 spanCount=5000    210928.159 ± 9407.215 B/op     9568701.045 ± 147724.457 B/op     -9357772.89 
doExport delayMs=5 spanCount=1000    30484.471 ± 528.565 B/op       2700314.456 ± 58483.251 B/op      -2669829.99 
doExport delayMs=5 spanCount=2000    57898.335 ± 2105.221 B/op      5499651.384 ± 170906.364 B/op     -5441753.05 
doExport delayMs=5 spanCount=5000    217313.213 ± 6233.238 B/op     16553477.005 ± 356207.002 B/op    -16336163.79

The memory usage issues

Improvements

The changes:

  • Replace whenA, ifM with if statement
  • Replace mapN with a sequential map

The baseline:

Benchmark                     (backend)  (delayMs)  (spanCount)   Mode  Cnt        Score       Error   Units
doExport                           sdk          1         2000  thrpt   50      728.505 ±    23.237   ops/s
doExport:gc.alloc.rate             sdk          1         2000  thrpt   50     2903.986 ±   101.548  MB/sec
doExport:gc.alloc.rate.norm        sdk          1         2000  thrpt   50  4204583.353 ± 67836.192    B/op
doExport:gc.count                  sdk          1         2000  thrpt   50      478.000              counts
doExport:gc.time                   sdk          1         2000  thrpt   50      420.000                  ms

This PR:

Benchmark                     (backend)  (delayMs)  (spanCount)   Mode  Cnt        Score       Error   Units
doExport                           sdk          1         2000  thrpt   50      741.273 ±     20.493   ops/s
doExport:gc.alloc.rate             sdk          1         2000  thrpt   50     2827.580 ±     84.112  MB/sec
doExport:gc.alloc.rate.norm        sdk          1         2000  thrpt   50  4035678.292 ± 177243.181    B/op
doExport:gc.count                  sdk          1         2000  thrpt   50      460.000               counts
doExport:gc.time                   sdk          1         2000  thrpt   50      381.000                   ms

Well, let's be honest. There are some improvements that lie within the error rate deviation (hehe), but nothing crazy.

Bonus point - use Queue.bounded

Throughput is ~30%, and the normalized (per op) memory usage is ~3-3.5 times lower.

Benchmark                     (backend)  (delayMs)  (spanCount)   Mode  Cnt        Score       Error   Units
doExport                           sdk          1         2000  thrpt   50      953.312 ±    13.238   ops/s
doExport:async                     sdk          1         2000  thrpt               NaN                 ---
doExport:gc.alloc.rate             sdk          1         2000  thrpt   50     1103.294 ±    19.214  MB/sec
doExport:gc.alloc.rate.norm        sdk          1         2000  thrpt   50  1221963.623 ± 14936.236    B/op
doExport:gc.count                  sdk          1         2000  thrpt   50      249.000              counts
doExport:gc.time                   sdk          1         2000  thrpt   50      165.000                  ms

Next steps

@iRevive iRevive added module:sdk Features and improvements to the sdk module performance Performance improvements labels Sep 27, 2024
@iRevive iRevive merged commit 0ac353d into typelevel:main Oct 5, 2024
10 checks passed
@iRevive iRevive deleted the benchmarks/bsp branch October 5, 2024 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:sdk Features and improvements to the sdk module performance Performance improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant