Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 22 additions & 8 deletions lib/prism/lex_compat.rb
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
# frozen_string_literal: true
# :markup: markdown

require "delegate"

module Prism
# This class is responsible for lexing the source using prism and then
# converting those tokens to be compatible with Ripper. In the vast majority
Expand Down Expand Up @@ -201,27 +199,43 @@ def deconstruct_keys(keys)
# When we produce tokens, we produce the same arrays that Ripper does.
# However, we add a couple of convenience methods onto them to make them a
# little easier to work with. We delegate all other methods to the array.
class Token < SimpleDelegator
# @dynamic initialize, each, []
class Token < BasicObject
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not just subclass Array?
That would avoid an extra indirection and (in theory at least) be more compatible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this but then ignores the == from the subclass. Somehow also slower on cruby

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, that's subtle, I think it's because of these semantics: https://github.com/truffleruby/truffleruby/blob/a48ecf894f19636f09235f5ac70ea6d9889be255/src/main/ruby/truffleruby/core/array.rb#L129-L132
i.e. if other is not an Array then always use other == self, but if other is an Array then just compare all elements and don't call other.==

Copy link
Member

@eregon eregon Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change turned out to reveal a bug in TruffleRuby, as TruffleRuby wouldn't destructure/splat such "BasicObject arrays" since BasicObject doesn't have respond_to?.
Fixed in truffleruby/truffleruby#4128

Fixing/revealing that bug is good, but perf-wise having to go through that slow path coercion (rb_convert_type()) doesn't seem great, I think we should use regular arrays like ::Ripper and then maybe have some private constants so we can do things like:

token[LOCATION]
token[EVENT]
token[VALUE]
token[STATE]

Any thoughts on that?

Copy link
Member

@eregon eregon Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Token subclasses we'd still need to use those to have the custom ==, but for instances of Token we could use a plain Array instead.
It would also mean less allocations which is always helpful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this https://github.com/ruby/prism/pull/3868/changes#r2724651412? Sorry, I didn't see that comment at all. Would defining to_ary do it?

Copy link
Member

@eregon eregon Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would still go through rb_convert_type() but more direct than without:
https://github.com/truffleruby/truffleruby/blob/4180ab960fe223d333e86b6f3263eb1e29d8f942/src/main/ruby/truffleruby/core/type.rb#L301
vs
https://github.com/truffleruby/truffleruby/blob/4180ab960fe223d333e86b6f3263eb1e29d8f942/src/main/ruby/truffleruby/core/type.rb#L303

So yeah that'd be good but even better would be to use real arrays as I'd think e.g. all method lookups in rb_convert_type() are uncached on CRuby, and likely megamorphic on TruffleRuby.

# Create a new token object with the given ripper-compatible array.
def initialize(array)
@array = array
end

# The location of the token in the source.
def location
self[0]
@array[0]
end

# The type of the token.
def event
self[1]
@array[1]
end

# The slice of the source that this token represents.
def value
self[2]
@array[2]
end

# The state of the lexer when this token was produced.
def state
self[3]
@array[3]
end

# We want to pretend that this is just an Array.
def ==(other) # :nodoc:
@array == other
end
Copy link
Member

@eregon eregon Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For speed it might be relevant to define to_ary, since that will be checked by Array#== and more methods when using this Token as an Array (e.g. https://github.com/truffleruby/truffleruby/blob/a48ecf894f19636f09235f5ac70ea6d9889be255/src/main/ruby/truffleruby/core/array.rb#L130).
Defining to_ary will avoid going through respond_to_missing? for that case.


def respond_to_missing?(name, include_private = false) # :nodoc:
@array.respond_to?(name, include_private)
end

def method_missing(name, ...) # :nodoc:
@array.send(name, ...)
end
end

Expand Down