Skip to content

Problems with default UnrollFactor in V0.11.0 #837

Closed
@Tornhoof

Description

@Tornhoof

Description
I just upgraded SpanJson to v0.11 of Benchmark.NET and the new UnrollFactor logic does not work for codegen-heavy benchmarks.
Actual Behaviour

// Benchmark: ModelBenchmark.DeserializeAccessTokenWithJilSerializer: DefaultJob
// *** Execute ***
// Launch: 1 / 1
// BeforeAnythingElse
// Benchmark Process Environment Information:
// Runtime=.NET Core 2.1.1 (CoreCLR 4.6.26606.02, CoreFX 4.6.26606.05), 64bit RyuJIT
// GC=Concurrent Workstation
// Job: DefaultJob
OverheadJitting  1: 1 op, 327424.34 ns, 327.4243 us/op
WorkloadJitting  1: 1 op, 2934831805.27 ns, 2.9348 s/op
WorkloadWarmup   1: 1 op, 16384.02 ns, 16.3840 us/op
WorkloadWarmup   2: 1 op, 10496.01 ns, 10.4960 us/op
WorkloadWarmup   3: 1 op, 9728.01 ns, 9.7280 us/op
WorkloadWarmup   4: 1 op, 9216.01 ns, 9.2160 us/op
WorkloadWarmup   5: 1 op, 8960.01 ns, 8.9600 us/op
WorkloadWarmup   6: 1 op, 23040.02 ns, 23.0400 us/op
WorkloadWarmup   7: 1 op, 7936.01 ns, 7.9360 us/op
WorkloadWarmup   8: 1 op, 8704.01 ns, 8.7040 us/op
WorkloadWarmup   9: 1 op, 9472.01 ns, 9.4720 us/op
WorkloadWarmup  10: 1 op, 8448.01 ns, 8.4480 us/op
// BeforeActualRun
WorkloadActual   1: 1 op, 10240.01 ns, 10.2400 us/op
WorkloadActual   2: 1 op, 25088.03 ns, 25.0880 us/op

The workloadActual lines then repeat around 90 times with quite a large variation in timings.
Expected behaviour
I get a larger amount of ops with a more consistent timing, the 1 op benchmark is wrong on a large large scale.

// **************************
// Benchmark: ModelBenchmark.DeserializeAccessTokenWithJilSerializer: Job-GXPFMQ(UnrollFactor=2)
// *** Execute ***
// Launch: 1 / 1
// BeforeAnythingElse
// Benchmark Process Environment Information:
// Runtime=.NET Core 2.1.1 (CoreCLR 4.6.26606.02, CoreFX 4.6.26606.05), 64bit RyuJIT
// GC=Concurrent Workstation
// Job: Job-CZJWNW(UnrollFactor=2)
OverheadJitting  1: 2 op, 352000.36 ns, 176.0002 us/op
WorkloadJitting  1: 2 op, 2947211977.95 ns, 1.4736 s/op
WorkloadPilot    1: 4 op, 13568.01 ns, 3.3920 us/op
WorkloadPilot    2: 8 op, 14848.02 ns, 1.8560 us/op
WorkloadPilot    3: 16 op, 15872.02 ns, 992.0010 ns/op
WorkloadPilot    4: 32 op, 25344.03 ns, 792.0008 ns/op
WorkloadPilot    5: 64 op, 44288.05 ns, 692.0007 ns/op
WorkloadPilot    6: 128 op, 84480.09 ns, 660.0007 ns/op
WorkloadPilot    7: 256 op, 152576.16 ns, 596.0006 ns/op
WorkloadPilot    8: 512 op, 300032.31 ns, 586.0006 ns/op
WorkloadPilot    9: 1024 op, 574208.59 ns, 560.7506 ns/op
WorkloadPilot   10: 2048 op, 1207041.24 ns, 589.3756 ns/op
WorkloadPilot   11: 4096 op, 2299650.35 ns, 561.4381 ns/op
WorkloadPilot   12: 8192 op, 4912389.03 ns, 599.6569 ns/op
WorkloadPilot   13: 16384 op, 9578249.81 ns, 584.6100 ns/op
WorkloadPilot   14: 32768 op, 18317842.76 ns, 559.0162 ns/op
WorkloadPilot   15: 65536 op, 36165925.03 ns, 551.8482 ns/op
WorkloadPilot   16: 131072 op, 72334922.07 ns, 551.8717 ns/op
WorkloadPilot   17: 262144 op, 143300498.74 ns, 546.6480 ns/op
WorkloadPilot   18: 524288 op, 285817892.68 ns, 545.1544 ns/op
WorkloadPilot   19: 1048576 op, 572262474.00 ns, 545.7520 ns/op
OverheadWarmup   1: 1048576 op, 3304963.38 ns, 3.1519 ns/op
OverheadWarmup   2: 1048576 op, 3340035.42 ns, 3.1853 ns/op
OverheadWarmup   3: 1048576 op, 3255811.33 ns, 3.1050 ns/op
OverheadWarmup   4: 1048576 op, 3256067.33 ns, 3.1052 ns/op
OverheadWarmup   5: 1048576 op, 3302403.38 ns, 3.1494 ns/op
OverheadWarmup   6: 1048576 op, 3288835.37 ns, 3.1365 ns/op
OverheadActual   1: 1048576 op, 3316483.40 ns, 3.1628 ns/op
OverheadActual   2: 1048576 op, 3276291.35 ns, 3.1245 ns/op
OverheadActual   3: 1048576 op, 3233283.31 ns, 3.0835 ns/op
OverheadActual   4: 1048576 op, 3339267.42 ns, 3.1846 ns/op
OverheadActual   5: 1048576 op, 3338755.42 ns, 3.1841 ns/op
OverheadActual   6: 1048576 op, 3284739.36 ns, 3.1326 ns/op
OverheadActual   7: 1048576 op, 3252483.33 ns, 3.1018 ns/op
OverheadActual   8: 1048576 op, 3229443.31 ns, 3.0798 ns/op
OverheadActual   9: 1048576 op, 3227139.30 ns, 3.0776 ns/op
OverheadActual  10: 1048576 op, 3257347.34 ns, 3.1064 ns/op
OverheadActual  11: 1048576 op, 3340035.42 ns, 3.1853 ns/op
OverheadActual  12: 1048576 op, 3311619.39 ns, 3.1582 ns/op
OverheadActual  13: 1048576 op, 3286019.36 ns, 3.1338 ns/op
OverheadActual  14: 1048576 op, 3352835.43 ns, 3.1975 ns/op
OverheadActual  15: 1048576 op, 3325187.40 ns, 3.1711 ns/op
WorkloadWarmup   1: 1048576 op, 571615561.33 ns, 545.1351 ns/op
WorkloadWarmup   2: 1048576 op, 571237192.95 ns, 544.7742 ns/op
WorkloadWarmup   3: 1048576 op, 572946506.70 ns, 546.4044 ns/op
WorkloadWarmup   4: 1048576 op, 571277128.99 ns, 544.8123 ns/op
WorkloadWarmup   5: 1048576 op, 571070280.78 ns, 544.6151 ns/op
WorkloadWarmup   6: 1048576 op, 571714377.44 ns, 545.2293 ns/op
WorkloadWarmup   7: 1048576 op, 571312713.02 ns, 544.8463 ns/op
// BeforeActualRun
WorkloadActual   1: 1048576 op, 571627337.35 ns, 545.1463 ns/op
WorkloadActual   2: 1048576 op, 571613257.33 ns, 545.1329 ns/op

I think that's bececause the WorkloadJitting is taking that long (yeah it's codegen heavy), as soon as I change the Unrollfactor to 2 it works as expected (atleast as far as I've run the benchmarks).
The benchmark is that codegen heavy, because I initialize ~480 serializers statically in that class (the whole benchmark class is one large t4 script, see https://github.com/Tornhoof/SpanJson/blob/master/SpanJson.Benchmarks/ModelBenchmark.cs)

Repro

  1. Checkout Branch BenchmarkUpdates on https://github.com/Tornhoof/SpanJson
  2. cd SpanJson.Benchmarks
  3. dotnet run -c Release and select 2

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions