Further optimize ripper translator by not using delegate#3868
Further optimize ripper translator by not using delegate#3868
delegate#3868Conversation
ebf61ff to
111c256
Compare
| # little easier to work with. We delegate all other methods to the array. | ||
| class Token < SimpleDelegator | ||
| # @dynamic initialize, each, [] | ||
| class Token < BasicObject |
There was a problem hiding this comment.
Any reason to not just subclass Array?
That would avoid an extra indirection and (in theory at least) be more compatible.
There was a problem hiding this comment.
I tried this but then ignores the == from the subclass. Somehow also slower on cruby
There was a problem hiding this comment.
Ah right, that's subtle, I think it's because of these semantics: https://github.com/truffleruby/truffleruby/blob/a48ecf894f19636f09235f5ac70ea6d9889be255/src/main/ruby/truffleruby/core/array.rb#L129-L132
i.e. if other is not an Array then always use other == self, but if other is an Array then just compare all elements and don't call other.==
There was a problem hiding this comment.
This change turned out to reveal a bug in TruffleRuby, as TruffleRuby wouldn't destructure/splat such "BasicObject arrays" since BasicObject doesn't have respond_to?.
Fixed in truffleruby/truffleruby#4128
Fixing/revealing that bug is good, but perf-wise having to go through that slow path coercion (rb_convert_type()) doesn't seem great, I think we should use regular arrays like ::Ripper and then maybe have some private constants so we can do things like:
token[LOCATION]
token[EVENT]
token[VALUE]
token[STATE]Any thoughts on that?
There was a problem hiding this comment.
For Token subclasses we'd still need to use those to have the custom ==, but for instances of Token we could use a plain Array instead.
It would also mean less allocations which is always helpful.
There was a problem hiding this comment.
Is this https://github.com/ruby/prism/pull/3868/changes#r2724651412? Sorry, I didn't see that comment at all. Would defining to_ary do it?
There was a problem hiding this comment.
That would still go through rb_convert_type() but more direct than without:
https://github.com/truffleruby/truffleruby/blob/4180ab960fe223d333e86b6f3263eb1e29d8f942/src/main/ruby/truffleruby/core/type.rb#L301
vs
https://github.com/truffleruby/truffleruby/blob/4180ab960fe223d333e86b6f3263eb1e29d8f942/src/main/ruby/truffleruby/core/type.rb#L303
So yeah that'd be good but even better would be to use real arrays as I'd think e.g. all method lookups in rb_convert_type() are uncached on CRuby, and likely megamorphic on TruffleRuby.
|
Great speedup and nice to the extra |
Using it seems pretty bad for performance:
```rb
require "benchmark/ips"
require "prism"
require "ripper"
codes = Dir["**/*.rb"].map { File.read(it) }
Benchmark.ips do |x|
x.report("prism") { codes.each { Prism::Translation::Ripper.lex(it) } }
x.report("ripper") { codes.each { Ripper.lex(it) } }
x.compare!
end
```
```
# Before
ruby 4.0.0 (2025-12-25 revision 553f1675f3) +PRISM [x86_64-linux]
Warming up --------------------------------------
prism 1.000 i/100ms
ripper 1.000 i/100ms
Calculating -------------------------------------
prism 0.319 (± 0.0%) i/s (3.14 s/i) - 2.000 in 6.276154s
ripper 0.647 (± 0.0%) i/s (1.54 s/i) - 4.000 in 6.182662s
Comparison:
ripper: 0.6 i/s
prism: 0.3 i/s - 2.03x slower
# After
ruby 4.0.0 (2025-12-25 revision 553f1675f3) +PRISM [x86_64-linux]
Warming up --------------------------------------
prism 1.000 i/100ms
ripper 1.000 i/100ms
Calculating -------------------------------------
prism 0.482 (± 0.0%) i/s (2.08 s/i) - 3.000 in 6.225603s
ripper 0.645 (± 0.0%) i/s (1.55 s/i) - 4.000 in 6.205636s
Comparison:
ripper: 0.6 i/s
prism: 0.5 i/s - 1.34x slower
```
`vernier` tells me it does `method_missing` even for explicitly defined methods like `location`.
111c256 to
2ea8139
Compare
| # We want to pretend that this is just an Array. | ||
| def ==(other) # :nodoc: | ||
| @array == other | ||
| end |
There was a problem hiding this comment.
For speed it might be relevant to define to_ary, since that will be checked by Array#== and more methods when using this Token as an Array (e.g. https://github.com/truffleruby/truffleruby/blob/a48ecf894f19636f09235f5ac70ea6d9889be255/src/main/ruby/truffleruby/core/array.rb#L130).
Defining to_ary will avoid going through respond_to_missing? for that case.
Using it seems pretty bad for performance:
verniertells me it doesmethod_missingeven for explicitly defined methods likelocation. Probably that is for[].