Make benchmarks more precises and complete?

Hi,

I was interested by java vs csharp comparison.

I was quite surprised by the exposed results - in particular the bintree one where dotnet looks insanely slow - so I tested it locally.
I don't get exactly the same results (basically I fall into the error zone).
Here is what I did:

1. get 2.java, copy/paste it in a 1.cs (and fix the language but not the branching/code path)
2. compile both (in release mode and aot mode using graal for java)
3. run both with `time`

Here are my results (did multiple runs and it moves from ~20% since the execution is very fast):

```
/tmp/test $ time java app # java 22 standard mode
stretch tree of depth 7	 check: 255
64	 trees of depth 4	 check: 1984
16	 trees of depth 6	 check: 2032
long lived tree of depth 6	 check: 127

real	0m0,070s
user	0m0,073s
sys	0m0,018s
rmannibucau@rmannibucau-yupiik:/tmp/test $ time ./bin/Release/net8.0/test # dotnet standard mode
stretch tree of depth 7	 check: 255
64	 trees of depth 4	 check: 1984
16	 trees of depth 6	 check: 2032
long lived tree of depth 6	 check: 127

real	0m0,072s
user	0m0,039s
sys	0m0,012s
rmannibucau@rmannibucau-yupiik:/tmp/test $ time ./app # java native mode
stretch tree of depth 7	 check: 255
64	 trees of depth 4	 check: 1984
16	 trees of depth 6	 check: 2032
long lived tree of depth 6	 check: 127

real	0m0,008s
user	0m0,000s
sys	0m0,009s

rmannibucau@rmannibucau-yupiik:/tmp/test $ time /tmp/test/bin/Release/net8.0/linux-x64/publish/test # dotnet native mode (aot)
stretch tree of depth 7	 check: 255
64	 trees of depth 4	 check: 1984
16	 trees of depth 6	 check: 2032
long lived tree of depth 6	 check: 127

real	0m0,006s
user	0m0,004s
sys	0m0,004s
```

What's important to note is that we can't conclude dotnet is faster than java in native mode, if you run 100 times dotnet will statically be slower but can be overall faster - this is why I think the soft is too short and dotnet has so much rate adjustment than without tuning for such a short live execution you get this instability.

Note: i did all the bench on ubuntu with 16 i9 and 64G of ram (indeed way too much for these apps ;)) and no particular tuning.

What is important for me is:

1. I guess the OS machine is key and should be highlighted on the html pages
2. the difference is likely not that huge so something can be fishy in the setup (until you tested it on windows where it can be from my 
experience)
3. aot benchmarking can be neat
4. getting 100 runs and statistics about it can be worth it
5. can be worth ensuring all mains can loop to have longer durations
6. can be interesting once 4. is done to get the error % on the min/max/mean duration (pr percentiles) in the report
7. maybe realign the codes to ensure they are comparable (cs vs java was not 1-1 for bintree and it had a light but noticeable impact locally)

That said I still want to also say a big thank you cause it is a lot of work and always a very source to get started when working on these topics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make benchmarks more precises and complete? #440

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Make benchmarks more precises and complete? #440

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions