Diagnostic harnesses for continuous-adaptive and BIPS #32

chrisseaton · 2021-08-04T10:17:09Z

These harnesses help for diagnosing issues with warmup and performance.

The continuous one runs continuous-adaptive benchmarking, running forever and printing the iterations-per-second. You can see how performance changes over time.

The BIPs one uses the industry-standard benchmark-ips gem.

noahgibbs · 2021-08-04T10:28:46Z

The adaptive harness looks like it tries to run about one chunk of benchmarks per second, and to keep that true-ish by changing the number of iterations per chunk. That seems... fine? I'm not sure it's a problem we're currently addressing, but the code for it looks reasonable. It also won't help cases where a single iteration is more than a second, which is where we're having a lot of our worst TruffleRuby problems. Though of course we could configure the time units if we wanted to.

I'm using a more complicated harness for a lot of my testing (in https://github.com/Shopify/yjit-metrics). I could adopt an adaptive approach there, but I'm not 100% confident in some of the reporting and analysis consequences of doing so. Running a continually-varying number of iterations doesn't, on the surface, fix our "it's hard to compare runs" problem.

Basically: this looks like a high-quality implementation, but I'm not sure when we'd use it.

eregon · 2021-08-04T11:21:02Z

It also won't help cases where a single iteration is more than a second, which is where we're having a lot of our worst TruffleRuby problems.

If there is any issue feel free to report an issue or contact TruffleRuby devs at https://github.com/oracle/truffleruby/blob/master/README.md#contact.
I might also be able to run the benchmarks from this repo on TruffleRuby, e.g. if you would like some numbers to validate the results you get (any specific benchmark?).

Regarding warmup, the default harness already prints each iteration, so if there is too much variance between successive iterations it likely means not enough warmup.
In general for any non-trivial benchmark it should probably run for minimum a minute.

noahgibbs · 2021-08-04T11:35:47Z

@chrisseaton is looking into some problems we're having with psych-load, which runs around 2.5 seconds on CRuby, and similar on MJIT/YJIT. After the early warmup iterations, we're still seeing quite long times -- around 9.5 seconds/iter after 200 warmup iterations, and much slower on earlier iterations.

Railsbench and liquid-render are still getting LLVM load errors, so no results there.

Basically, I'm trying to figure out if 21.1.0 (latest stable on ruby-install when I started) is just a bad version for some reason. Certainly it's having a lot of trouble on benchmarks that use C native extensions much, but I know TruffleRuby has trouble with that in general.

The current results on Lee, however, are fantastic. So it's not slow across the board.

noahgibbs · 2021-08-04T11:38:24Z

Here's a breakdown of what we were seeing for warmup initially with TruffleRuby:

Warmup for Truffleruby:

      bench  samples     iter #1    iter #5   iter #10   iter #50  iter #100  iter #200
-----------  -------  ----------  ---------  ---------  ---------  ---------  ---------
        lee       24   17920.7ms   1576.7ms   1110.0ms   1308.1ms    519.9ms    358.9ms
 30k_ifelse       25   89563.4ms  31984.1ms  29828.2ms   6929.4ms   4557.4ms   2789.0ms
 psych-load       25   78102.6ms  13321.6ms   9532.7ms   9709.7ms  12765.6ms   9650.6ms
30k_methods       25  107693.3ms  43789.7ms  15142.4ms   1840.3ms    279.2ms    274.2ms
  optcarrot       24   42295.2ms   4488.0ms   3459.9ms   4865.2ms   5433.5ms   7647.5ms
     jekyll       20  129788.7ms  68729.0ms  65221.4ms  62573.9ms  66925.5ms  72451.4ms

We were pinning each process to a single CPU at that point, and by removing that we've improved some of the results (e.g. Lee), but psych-load is still pretty similar to what we're seeing there.

By comparison, here's a block of (prerelease 3.1 no-JIT) CRuby results:

Warmup for Prod_ruby_no_jit:

        bench  samples   iter #1   iter #2   iter #3   iter #4   iter #5   iter #8  iter #10
-------------  -------  --------  --------  --------  --------  --------  --------  --------
   psych-load       10  2588.3ms  2586.7ms  2590.9ms  2590.9ms  2593.5ms  2591.5ms  2592.1ms
  30k_methods       10  6429.0ms  6427.1ms  6426.8ms  6426.5ms  6426.9ms  6426.8ms  6426.8ms
   30k_ifelse       10  2374.9ms  2367.1ms  2366.4ms  2366.5ms  2366.7ms  2366.7ms  2366.4ms
 activerecord       10   171.8ms   168.4ms   168.4ms   168.5ms   168.3ms   168.3ms   168.4ms
liquid-render       10   206.2ms   205.0ms   209.7ms   204.5ms   205.7ms   209.5ms   205.8ms
          lee       10  1312.9ms  1285.5ms  1312.9ms  1299.1ms  1320.4ms  1298.2ms  1297.2ms
    optcarrot       10  6450.3ms  6499.7ms  6472.6ms  6435.5ms  6461.5ms  6492.6ms  6541.5ms

-------------  -------  --------  --------  --------  --------  --------  --------  --------
Each iteration is a set of samples of that iteration in a series.
Samples is the number of runs (samples taken) for each specific iteration number.

eregon · 2021-08-04T11:39:49Z

Railsbench and liquid-render are still getting LLVM load errors, so no results there.

Which errors specifically?
BTW do you measure on Native or JVM?

Basically, I'm trying to figure out if 21.1.0 (latest stable on ruby-install when I started) is just a bad version for some reason.
Certainly it's having a lot of trouble on benchmarks that use C native extensions much, but I know TruffleRuby has trouble with that in general.

TruffleRuby should work for most popular C extensions out of the box.
Could you try with TruffleRuby 21.2.0?
There is also 21.2.0.1 which literally just got out, which fixes an issue in date (specificaly DateTime#strftime): oracle/truffleruby#2406

noahgibbs · 2021-08-04T11:44:50Z

Chris is getting me something built with a shellscript, he says, but doesn't seem to think a released version fixes this.

I think we're on native - we're using whatever ruby-install builds by default.

Here's the LLVM error we're seeing for the liquid-render benchmark after fixing the YAML-load line:

constants.c:39: warning: already initialized constant StringScanner::Version
/home/ubuntu/.rubies/truffleruby-21.1.0/lib/truffle/strscan.rb:46: warning: previous definition of Version was here
constants.c:39: warning: already initialized constant StringScanner::Id
/home/ubuntu/.rubies/truffleruby-21.1.0/lib/truffle/strscan.rb:45: warning: previous definition of Id was here
/home/ubuntu/.gem/truffleruby/2.7.2/gems/strscan-3.0.0/ext/strscan/strscan.c:214:in `strscan_s_allocate': External LLVMFunction onig_region_init cannot be found. (com.oracle.truffle.llvm.runtime.except.LLVMLinkerException) (RuntimeError)
Translated to internal error
        from /home/ubuntu/.rubies/truffleruby-21.1.0/lib/truffle/truffle/cext.rb:1203:in `__allocate__'
        from /home/ubuntu/ym/yjit-bench/benchmarks/liquid-render/lib/liquid/lexer.rb:27:in `initialize'
        from /home/ubuntu/ym/yjit-bench/benchmarks/liquid-render/lib/liquid/parser.rb:6:in `initialize'
        from /home/ubuntu/ym/yjit-bench/benchmarks/liquid-render/lib/liquid/variable.rb:64:in `strict_parse'
        from /home/ubuntu/ym/yjit-bench/benchmarks/liquid-render/lib/liquid/parser_switching.rb:34:in `strict_parse_with_error_context'

noahgibbs · 2021-08-04T11:47:40Z

Here's a fix to yjit-bench's liquid-render that should get around the :aliases keyword issue on Truffle (and other pre-3.1 Rubies):

diff --git a/benchmarks/liquid-render/performance/shopify/database.rb b/benchmarks/liquid-render/performance/shopify/database.rb
index 3e9f356..2953a32 100644
--- a/benchmarks/liquid-render/performance/shopify/database.rb
+++ b/benchmarks/liquid-render/performance/shopify/database.rb
@@ -7,7 +7,7 @@ module Database
   # to liquid as assigns. All this is based on Shopify
   def self.tables
     @tables ||= begin
-      db = YAML.load_file("#{__dir__}/vision.database.yml", aliases: true)
+      db = YAML.safe_load(File.read("#{__dir__}/vision.database.yml"), aliases: true)

       # From vision source
       db['products'].each do |product|

eregon · 2021-08-04T11:52:27Z

I think we're on native - we're using whatever ruby-install builds by default.

Yes, it's Native by default.
You would need ruby-build truffleruby+graalvm-$VERSION ~/.rubies/truffleruby+graalvm-$VERSION + pass --jvm or set it in TRUFFLERUBYOPT if you want to try in JVM mode.

That error comes from gems/strscan-3.0.0/ext/strscan/strscan.c, i.e., the strscan gem is used and not the stdlib.
So a workaround would be to make sure there is no strscan gem in the Gemfile & Gemfile.lock.

noahgibbs · 2021-08-04T11:58:23Z

Ah, okay. Looks like it's a requirement of the net-imap gem (no longer built in for Ruby 3.1 and up.) I'll try to make sure it's not installed for TruffleRuby runs. That's probably what's gone wrong for Railsbench as well.

eregon · 2021-08-04T15:06:27Z

I ran railsbench locally on truffleruby 21.2.0 JVM and CRuby 2.7.3 (no JIT).
Only change needed is #33

CRuby 2.7.3:

$ WARMUP_ITRS=10 MIN_BENCH_ITRS=10 MIN_BENCH_TIME=0 ruby -I../../harness benchmark.rb
itr #11: 2538ms
itr #12: 2541ms
itr #13: 2506ms
itr #14: 2559ms
itr #15: 2473ms
itr #16: 2501ms
itr #17: 2551ms
itr #18: 2515ms
itr #19: 2588ms
itr #20: 2539ms

truffleruby 21.2.0 CE JVM (which you can install with ruby-build truffleruby+graalvm-21.2.0 ~/.rubies/truffleruby+graalvm-21.2.0) is quite a bit faster, as expected:

TRUFFLERUBYOPT=--jvm WARMUP_ITRS=100 MIN_BENCH_ITRS=50 MIN_BENCH_TIME=0 ruby -I../../harness benchmark.rb
itr #141: 1197ms
itr #142: 1197ms
itr #143: 1201ms
itr #144: 1205ms
itr #145: 1263ms
itr #146: 1259ms
itr #147: 1218ms
itr #148: 1198ms
itr #149: 1192ms
itr #150: 1198ms

More iterations would be safer, e.g., 200 total.

On truffleruby 21.2.0 CE Native it is not as fast, most likely because the Native CE GC is less efficient:

itr #131: 3144ms
itr #132: 2233ms
itr #133: 2258ms
itr #134: 2219ms
itr #135: 3211ms
itr #136: 2251ms
itr #137: 2276ms
itr #138: 2295ms
itr #139: 3251ms
itr #140: 2330ms
itr #141: 2400ms
itr #142: 3205ms
itr #143: 2325ms
itr #144: 2302ms
itr #145: 2298ms
itr #146: 3293ms
itr #147: 2336ms
itr #148: 2334ms
itr #149: 2267ms
itr #150: 3228ms

With https://github.com/eregon/yjit-bench/blob/truffleruby/analyze.rb

ruby ../../analyze.rb results-*.csv
	speedup	median	average	[min   -   max]
ruby 2.7.3p183 (2021-04-05 revision 6847ee089d) [x86_64-linux] - ruby-2.7.3-2021-08-04-170033
	1.0x	2.539	2.532	[2.474 - 2.589]
truffleruby 21.2.0.1, like ruby 2.7.3, GraalVM CE Native [x86_64-linux] - truffleruby-21.2.0.1-2021-08-04-164200
	1.1x	2.336	2.599	[2.268 - 3.294]
truffleruby 21.2.0, like ruby 2.7.3, GraalVM CE JVM [x86_64-linux] - truffleruby-21.2.0-2021-08-04-165409
	2.1x	1.200	1.213	[1.193 - 1.263]

Those results are also somewhat consistent with https://gist.github.com/eregon/c59a896835ffdf8ad519341d78bbf8c8 although I did not touch CPU frequency/governor for those runs.

eregon · 2021-08-04T17:14:59Z

FYI I tried liquid-render, and peak on TruffleRuby is about the same as CRuby 2.7.3 on my machine, which is not expected (it should be much faster).
The issue seems to be that $~ is accessed by Date._parse which is written in C and that causes lots of deoptimizations (= a performance bug) due to iterating the stack and deoptimizing the Ruby frame above to access $~.

Diagnostic harnesses for continuous-adaptive and BIPS

2290b31

chrisseaton requested a review from noahgibbs August 4, 2021 10:18

eregon mentioned this pull request Aug 4, 2021

require "strscan.so" uses the strscan gem and does not work oracle/truffleruby#2420

Closed

maximecb merged commit 830fe00 into main Aug 4, 2021

maximecb deleted the diagnostic-harnesses branch August 4, 2021 14:42

noahgibbs mentioned this pull request Aug 9, 2021

Benchmarking should be faster Shopify/yjit-metrics#30

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Diagnostic harnesses for continuous-adaptive and BIPS #32

Diagnostic harnesses for continuous-adaptive and BIPS #32

Uh oh!

chrisseaton commented Aug 4, 2021

Uh oh!

noahgibbs commented Aug 4, 2021

Uh oh!

eregon commented Aug 4, 2021

Uh oh!

noahgibbs commented Aug 4, 2021

Uh oh!

noahgibbs commented Aug 4, 2021

Uh oh!

eregon commented Aug 4, 2021 •

edited

Loading

Uh oh!

noahgibbs commented Aug 4, 2021

Uh oh!

noahgibbs commented Aug 4, 2021 •

edited

Loading

Uh oh!

eregon commented Aug 4, 2021

Uh oh!

noahgibbs commented Aug 4, 2021

Uh oh!

eregon commented Aug 4, 2021 •

edited

Loading

Uh oh!

eregon commented Aug 4, 2021

Uh oh!

Uh oh!

Diagnostic harnesses for continuous-adaptive and BIPS #32

Diagnostic harnesses for continuous-adaptive and BIPS #32

Uh oh!

Conversation

chrisseaton commented Aug 4, 2021

Uh oh!

noahgibbs commented Aug 4, 2021

Uh oh!

eregon commented Aug 4, 2021

Uh oh!

noahgibbs commented Aug 4, 2021

Uh oh!

noahgibbs commented Aug 4, 2021

Uh oh!

eregon commented Aug 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noahgibbs commented Aug 4, 2021

Uh oh!

noahgibbs commented Aug 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eregon commented Aug 4, 2021

Uh oh!

noahgibbs commented Aug 4, 2021

Uh oh!

eregon commented Aug 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eregon commented Aug 4, 2021

Uh oh!

Uh oh!

eregon commented Aug 4, 2021 •

edited

Loading

noahgibbs commented Aug 4, 2021 •

edited

Loading

eregon commented Aug 4, 2021 •

edited

Loading