-
Notifications
You must be signed in to change notification settings - Fork 28
Diagnostic harnesses for continuous-adaptive and BIPS #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The adaptive harness looks like it tries to run about one chunk of benchmarks per second, and to keep that true-ish by changing the number of iterations per chunk. That seems... fine? I'm not sure it's a problem we're currently addressing, but the code for it looks reasonable. It also won't help cases where a single iteration is more than a second, which is where we're having a lot of our worst TruffleRuby problems. Though of course we could configure the time units if we wanted to. I'm using a more complicated harness for a lot of my testing (in https://github.com/Shopify/yjit-metrics). I could adopt an adaptive approach there, but I'm not 100% confident in some of the reporting and analysis consequences of doing so. Running a continually-varying number of iterations doesn't, on the surface, fix our "it's hard to compare runs" problem. Basically: this looks like a high-quality implementation, but I'm not sure when we'd use it. |
If there is any issue feel free to report an issue or contact TruffleRuby devs at https://github.com/oracle/truffleruby/blob/master/README.md#contact. Regarding warmup, the default harness already prints each iteration, so if there is too much variance between successive iterations it likely means not enough warmup. |
@chrisseaton is looking into some problems we're having with psych-load, which runs around 2.5 seconds on CRuby, and similar on MJIT/YJIT. After the early warmup iterations, we're still seeing quite long times -- around 9.5 seconds/iter after 200 warmup iterations, and much slower on earlier iterations. Railsbench and liquid-render are still getting LLVM load errors, so no results there. Basically, I'm trying to figure out if 21.1.0 (latest stable on ruby-install when I started) is just a bad version for some reason. Certainly it's having a lot of trouble on benchmarks that use C native extensions much, but I know TruffleRuby has trouble with that in general. The current results on Lee, however, are fantastic. So it's not slow across the board. |
Here's a breakdown of what we were seeing for warmup initially with TruffleRuby:
We were pinning each process to a single CPU at that point, and by removing that we've improved some of the results (e.g. Lee), but psych-load is still pretty similar to what we're seeing there. By comparison, here's a block of (prerelease 3.1 no-JIT) CRuby results:
|
Which errors specifically?
TruffleRuby should work for most popular C extensions out of the box. |
Chris is getting me something built with a shellscript, he says, but doesn't seem to think a released version fixes this. I think we're on native - we're using whatever ruby-install builds by default. Here's the LLVM error we're seeing for the liquid-render benchmark after fixing the YAML-load line:
|
Here's a fix to yjit-bench's liquid-render that should get around the :aliases keyword issue on Truffle (and other pre-3.1 Rubies):
|
Yes, it's Native by default. That error comes from |
Ah, okay. Looks like it's a requirement of the net-imap gem (no longer built in for Ruby 3.1 and up.) I'll try to make sure it's not installed for TruffleRuby runs. That's probably what's gone wrong for Railsbench as well. |
I ran railsbench locally on truffleruby 21.2.0 JVM and CRuby 2.7.3 (no JIT). CRuby 2.7.3:
truffleruby 21.2.0 CE JVM (which you can install with
More iterations would be safer, e.g., 200 total. On truffleruby 21.2.0 CE Native it is not as fast, most likely because the Native CE GC is less efficient:
With https://github.com/eregon/yjit-bench/blob/truffleruby/analyze.rb
Those results are also somewhat consistent with https://gist.github.com/eregon/c59a896835ffdf8ad519341d78bbf8c8 although I did not touch CPU frequency/governor for those runs. |
FYI I tried |
These harnesses help for diagnosing issues with warmup and performance.
The continuous one runs continuous-adaptive benchmarking, running forever and printing the iterations-per-second. You can see how performance changes over time.
The BIPs one uses the industry-standard benchmark-ips gem.