Description
Description
I just upgraded SpanJson to v0.11 of Benchmark.NET and the new UnrollFactor logic does not work for codegen-heavy benchmarks.
Actual Behaviour
// Benchmark: ModelBenchmark.DeserializeAccessTokenWithJilSerializer: DefaultJob
// *** Execute ***
// Launch: 1 / 1
// BeforeAnythingElse
// Benchmark Process Environment Information:
// Runtime=.NET Core 2.1.1 (CoreCLR 4.6.26606.02, CoreFX 4.6.26606.05), 64bit RyuJIT
// GC=Concurrent Workstation
// Job: DefaultJob
OverheadJitting 1: 1 op, 327424.34 ns, 327.4243 us/op
WorkloadJitting 1: 1 op, 2934831805.27 ns, 2.9348 s/op
WorkloadWarmup 1: 1 op, 16384.02 ns, 16.3840 us/op
WorkloadWarmup 2: 1 op, 10496.01 ns, 10.4960 us/op
WorkloadWarmup 3: 1 op, 9728.01 ns, 9.7280 us/op
WorkloadWarmup 4: 1 op, 9216.01 ns, 9.2160 us/op
WorkloadWarmup 5: 1 op, 8960.01 ns, 8.9600 us/op
WorkloadWarmup 6: 1 op, 23040.02 ns, 23.0400 us/op
WorkloadWarmup 7: 1 op, 7936.01 ns, 7.9360 us/op
WorkloadWarmup 8: 1 op, 8704.01 ns, 8.7040 us/op
WorkloadWarmup 9: 1 op, 9472.01 ns, 9.4720 us/op
WorkloadWarmup 10: 1 op, 8448.01 ns, 8.4480 us/op
// BeforeActualRun
WorkloadActual 1: 1 op, 10240.01 ns, 10.2400 us/op
WorkloadActual 2: 1 op, 25088.03 ns, 25.0880 us/op
The workloadActual lines then repeat around 90 times with quite a large variation in timings.
Expected behaviour
I get a larger amount of ops with a more consistent timing, the 1 op benchmark is wrong on a large large scale.
// **************************
// Benchmark: ModelBenchmark.DeserializeAccessTokenWithJilSerializer: Job-GXPFMQ(UnrollFactor=2)
// *** Execute ***
// Launch: 1 / 1
// BeforeAnythingElse
// Benchmark Process Environment Information:
// Runtime=.NET Core 2.1.1 (CoreCLR 4.6.26606.02, CoreFX 4.6.26606.05), 64bit RyuJIT
// GC=Concurrent Workstation
// Job: Job-CZJWNW(UnrollFactor=2)
OverheadJitting 1: 2 op, 352000.36 ns, 176.0002 us/op
WorkloadJitting 1: 2 op, 2947211977.95 ns, 1.4736 s/op
WorkloadPilot 1: 4 op, 13568.01 ns, 3.3920 us/op
WorkloadPilot 2: 8 op, 14848.02 ns, 1.8560 us/op
WorkloadPilot 3: 16 op, 15872.02 ns, 992.0010 ns/op
WorkloadPilot 4: 32 op, 25344.03 ns, 792.0008 ns/op
WorkloadPilot 5: 64 op, 44288.05 ns, 692.0007 ns/op
WorkloadPilot 6: 128 op, 84480.09 ns, 660.0007 ns/op
WorkloadPilot 7: 256 op, 152576.16 ns, 596.0006 ns/op
WorkloadPilot 8: 512 op, 300032.31 ns, 586.0006 ns/op
WorkloadPilot 9: 1024 op, 574208.59 ns, 560.7506 ns/op
WorkloadPilot 10: 2048 op, 1207041.24 ns, 589.3756 ns/op
WorkloadPilot 11: 4096 op, 2299650.35 ns, 561.4381 ns/op
WorkloadPilot 12: 8192 op, 4912389.03 ns, 599.6569 ns/op
WorkloadPilot 13: 16384 op, 9578249.81 ns, 584.6100 ns/op
WorkloadPilot 14: 32768 op, 18317842.76 ns, 559.0162 ns/op
WorkloadPilot 15: 65536 op, 36165925.03 ns, 551.8482 ns/op
WorkloadPilot 16: 131072 op, 72334922.07 ns, 551.8717 ns/op
WorkloadPilot 17: 262144 op, 143300498.74 ns, 546.6480 ns/op
WorkloadPilot 18: 524288 op, 285817892.68 ns, 545.1544 ns/op
WorkloadPilot 19: 1048576 op, 572262474.00 ns, 545.7520 ns/op
OverheadWarmup 1: 1048576 op, 3304963.38 ns, 3.1519 ns/op
OverheadWarmup 2: 1048576 op, 3340035.42 ns, 3.1853 ns/op
OverheadWarmup 3: 1048576 op, 3255811.33 ns, 3.1050 ns/op
OverheadWarmup 4: 1048576 op, 3256067.33 ns, 3.1052 ns/op
OverheadWarmup 5: 1048576 op, 3302403.38 ns, 3.1494 ns/op
OverheadWarmup 6: 1048576 op, 3288835.37 ns, 3.1365 ns/op
OverheadActual 1: 1048576 op, 3316483.40 ns, 3.1628 ns/op
OverheadActual 2: 1048576 op, 3276291.35 ns, 3.1245 ns/op
OverheadActual 3: 1048576 op, 3233283.31 ns, 3.0835 ns/op
OverheadActual 4: 1048576 op, 3339267.42 ns, 3.1846 ns/op
OverheadActual 5: 1048576 op, 3338755.42 ns, 3.1841 ns/op
OverheadActual 6: 1048576 op, 3284739.36 ns, 3.1326 ns/op
OverheadActual 7: 1048576 op, 3252483.33 ns, 3.1018 ns/op
OverheadActual 8: 1048576 op, 3229443.31 ns, 3.0798 ns/op
OverheadActual 9: 1048576 op, 3227139.30 ns, 3.0776 ns/op
OverheadActual 10: 1048576 op, 3257347.34 ns, 3.1064 ns/op
OverheadActual 11: 1048576 op, 3340035.42 ns, 3.1853 ns/op
OverheadActual 12: 1048576 op, 3311619.39 ns, 3.1582 ns/op
OverheadActual 13: 1048576 op, 3286019.36 ns, 3.1338 ns/op
OverheadActual 14: 1048576 op, 3352835.43 ns, 3.1975 ns/op
OverheadActual 15: 1048576 op, 3325187.40 ns, 3.1711 ns/op
WorkloadWarmup 1: 1048576 op, 571615561.33 ns, 545.1351 ns/op
WorkloadWarmup 2: 1048576 op, 571237192.95 ns, 544.7742 ns/op
WorkloadWarmup 3: 1048576 op, 572946506.70 ns, 546.4044 ns/op
WorkloadWarmup 4: 1048576 op, 571277128.99 ns, 544.8123 ns/op
WorkloadWarmup 5: 1048576 op, 571070280.78 ns, 544.6151 ns/op
WorkloadWarmup 6: 1048576 op, 571714377.44 ns, 545.2293 ns/op
WorkloadWarmup 7: 1048576 op, 571312713.02 ns, 544.8463 ns/op
// BeforeActualRun
WorkloadActual 1: 1048576 op, 571627337.35 ns, 545.1463 ns/op
WorkloadActual 2: 1048576 op, 571613257.33 ns, 545.1329 ns/op
I think that's bececause the WorkloadJitting is taking that long (yeah it's codegen heavy), as soon as I change the Unrollfactor to 2 it works as expected (atleast as far as I've run the benchmarks).
The benchmark is that codegen heavy, because I initialize ~480 serializers statically in that class (the whole benchmark class is one large t4 script, see https://github.com/Tornhoof/SpanJson/blob/master/SpanJson.Benchmarks/ModelBenchmark.cs)
Repro
- Checkout Branch BenchmarkUpdates on https://github.com/Tornhoof/SpanJson
- cd SpanJson.Benchmarks
- dotnet run -c Release and select 2