Skip to content

Conversation

chrisseaton
Copy link
Collaborator

@chrisseaton chrisseaton commented Feb 16, 2021

Someone asked me how TruffleRuby optimised class variables. I went to check and was surprised to find it completely unoptimised. By pulling out a few fast-paths from the code behind boundaries and switching the data structure used to store the class variables I've managed to significantly improve performance for a few common cases.

Before:

           ivar_read    455.654M (±15.7%) i/s -      2.238B in   5.090222s
                read     48.886M (± 9.9%) i/s -    243.582M in   5.050903s
          class_read     57.385M (± 7.5%) i/s -    286.451M in   5.034053s
       constant_read     47.889M (± 5.2%) i/s -    239.445M in   5.016469s
               write     13.938M (± 4.6%) i/s -     70.387M in   5.061929s
         class_write     14.790M (± 6.1%) i/s -     74.568M in   5.063688s
                 get     30.744M (± 8.0%) i/s -    154.151M in   5.050587s
                 set     11.971M (± 5.4%) i/s -     59.695M in   5.003366s
          defined_op     49.961M (± 2.3%) i/s -    249.753M in   5.001779s
      defined_method     31.913M (± 6.0%) i/s -    159.384M in   5.014710s
     class_variables      5.048M (± 6.2%) i/s -     25.422M in   5.057317s

After:

           ivar_read    493.733M (± 9.3%) i/s -      2.345B in   4.963858s
                read    533.243M (±16.8%) i/s -      2.583B in   5.101963s
          class_read    796.643M (± 9.3%) i/s -      4.077B in   5.165160s
       constant_read    443.462M (±14.0%) i/s -      2.168B in   5.058631s
               write    654.199M (± 1.7%) i/s -      3.305B in   5.053695s
         class_write    977.600M (± 2.5%) i/s -      4.929B in   5.045495s
                 get    560.218M (± 2.9%) i/s -      2.818B in   5.034547s
                 set    355.147M (± 3.7%) i/s -      1.795B in   5.061929s
          defined_op    976.669M (± 3.2%) i/s -      4.937B in   5.060990s
      defined_method    560.041M (± 2.6%) i/s -      2.834B in   5.064155s
     class_variables      4.194M (± 1.7%) i/s -     21.012M in   5.011955s

Difference:

           ivar_read    no change
                read    > 10x
          class_read    > 10x
       constant_read    10x
               write    50x
         class_write    > 50x
                 get    20x
                 set    30x
          defined_op    20x
      defined_method    20x
     class_variables    a little slower

Compared to MRI:

ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-darwin19]
           ivar_read     18.903M (± 1.9%) i/s -     95.796M in   5.069657s
                read     12.228M (± 2.2%) i/s -     61.593M in   5.039576s
          class_read     11.812M (± 3.1%) i/s -     59.006M in   5.001077s
       constant_read     12.110M (± 2.1%) i/s -     61.383M in   5.071081s
               write      5.226M (± 2.1%) i/s -     26.613M in   5.094473s
                 get      8.520M (± 1.7%) i/s -     43.100M in   5.059971s
                 set      4.328M (± 3.2%) i/s -     21.725M in   5.025026s
          defined_op     14.676M (± 1.9%) i/s -     74.354M in   5.068288s
      defined_method     10.210M (± 3.6%) i/s -     51.086M in   5.010694s
     class_variables      2.055M (± 2.3%) i/s -     10.320M in   5.023387s

Compared to JRuby:

jruby 9.2.14.0 (2.5.7) 2020-12-08 ebe64bafb9 OpenJDK 64-Bit Server VM 25.252-b14 on 1.8.0_252-b14 +jit [darwin-x86_64]
           ivar_read     23.438M (±10.7%) i/s -    113.363M in   5.001581s
                read     16.011M (± 2.8%) i/s -     80.148M in   5.009717s
          class_read     17.961M (± 7.4%) i/s -     89.175M in   4.998767s
       constant_read     16.231M (± 5.7%) i/s -     80.991M in   5.008429s
               write      8.229M (± 3.1%) i/s -     41.137M in   5.004191s
                 get      8.100M (± 2.0%) i/s -     40.694M in   5.025950s
                 set      5.521M (± 7.6%) i/s -     27.697M in   5.059839s
          defined_op     23.834M (± 8.7%) i/s -    116.812M in   5.011901s
      defined_method     10.295M (± 4.2%) i/s -     51.524M in   5.014946s
     class_variables      3.978M (± 3.8%) i/s -     20.043M in   5.046777s

What have I done?

  • switched the data structure from a hash map to a dynamic object (an object with a hidden class), which allows me to inline cache the location of it within the object and avoid boxing when storing a variable and unboxing when reading it
  • split operations on class variables into separate operations to find the right module for the variable, and then for reading the variable from that module
  • identified a few likely common fast paths that can be checked with guards

What does it look like?

For example, a simple class variable read in an instance method.

class Foo
  @@example = 14
  def example_read
    @@example
  end
end

This ends up as this graph. The class variable storage object can be turned into a constant, coming from the method's lexical scope at the point it was defined. We can then read the shape of the class variable storage and check it against an expected shape using an identity comparison. If that fails, we deoptimise. We can then cache the location within the storage of the class variable, so that becomes nothing more than a field read.

example_read

So a class variable read from an instance method of the class defining the class variable can be as simple as:

	0x12347d8f0:	movabs	rax, 0x61095e498
	0x12347d900:	cmp	dword ptr [rax + 0xc], 0xc212bc9b
	0x12347d907:	jne	0x12347d9c3
	0x12347d90d:	mov	r10d, dword ptr [rax + 0x18]

What else could we do:

If we find interesting cases not covered by these fast paths, such as particular patterns of using class variables from superclasses or extended modules, we can probably add those for up to a couple of levels of indirection fairly easily.

Benchmark:

require 'benchmark/ips'

class Foo

  @@example = 14
  @@example_constant = 14

  def initialize
    @example = 14
  end

  def example_ivar_read
    @example
  end

  def example_read
    @@example
  end

  def self.example_class_read
    @@example
  end

  def example_constant_read
    @@example_constant
  end

  def example_write(value)
    @@example = value
  end

  def self.example_class_write(value)
    @@example = value
  end

  def example_get
    self.class.class_variable_get(:@@example)
  end

  def example_set(value)
    self.class.class_variable_set(:@@example, value)
  end

  def example_defined_op
    defined?(@@example)
  end

  def example_defined_method
    self.class.class_variable_defined?(:@@example)
  end

  def example_class_variables
    self.class.class_variables
  end

end

foo = Foo.new
x = 100

Benchmark.ips do |x|
  x.warmup = 5
  x.time = 5

  x.report('ivar_read') { foo.example_ivar_read }
  x.report('read') { foo.example_read }
  x.report('class_read') { Foo.example_class_read }
  x.report('constant_read') { foo.example_constant_read }
  x.report('write') { foo.example_write(x) }
  x.report('class_write') { Foo.example_class_write(x) }
  x.report('get') { foo.example_get }
  x.report('set') { foo.example_set(x) }
  x.report('defined_op') { foo.example_defined_op }
  x.report('defined_method') { foo.example_defined_method }
  x.report('class_variables') { foo.example_class_variables }
end

@chrisseaton chrisseaton requested a review from eregon February 16, 2021 00:28
@chrisseaton chrisseaton added shopify Pull requests from Shopify performance labels Feb 16, 2021
Copy link
Member

@eregon eregon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great in general, thank you for optimizing class variables.

One trade-off of using a DynamicObject for this is if new class variables are later added on in some class, we might get some Shape polymorphism, while a given access site would always access the same variable and name. OTOH, that doesn't seem so common for class variables, because class variables are often initialized in the module body, and rarely added later on to an existing class.

Another possibility here might be to have a Map<String name, ClassVariableStorage storage> (and ClassVariableStorage would be a class with a single volatile field value), and then look that one up once, and then directly read/write the storage object (similar to GlobalVariables/GlobalVariableStorage but simpler as there are no hooks).
That would require caching on the Module instance though (which is making things for Engine caching/AST sharing more complicated), while the current approach seems to not need it.

Using a DynamicObject also loses the volatile behavior of class variables, which is described in the proposal for a Ruby memory model. I'm not sure how important it is, but it seems unfortunate and potentially problematic to lose that.
We could restore that by storing an AtomicReference instead of directly the value in the DynamicObject. I think we should do that.

this.classVariables,
key,
DynamicObjectLibrary.getUncached().getOrDefault(fromFields.classVariables, key, null));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@woess Isn't there a more direct way to do a (shallow) copy of a DynamicObject?
There used to be DynamicObject#copy, but I see nothing similar in DynamicObjectLibrary.

@chrisseaton
Copy link
Collaborator Author

Another possibility here might be to have a Map<String name, ClassVariableStorage storage> (and ClassVariableStorage would be a class with a single volatile field value), and then look that one up once, and then directly read/write the storage object (similar to GlobalVariables/GlobalVariableStorage but simpler as there are no hooks).

I thought about that but if we wanted to avoid boxing we'd have to re-invent some of what DynamicObject does for storage locations.

Using a DynamicObject also loses the volatile behavior of class variables, which is described in the proposal for a Ruby memory model.

Is there a barrier we can insert after writes or around reads to get volatile back?

@eregon
Copy link
Member

eregon commented Feb 16, 2021

I thought about that but if we wanted to avoid boxing we'd have to re-invent some of what DynamicObject does for storage locations.

Good point.

Is there a barrier we can insert after writes or around reads to get volatile back?

This might be useful:
https://github.com/graalvm/graal-jvmci-8/blob/210ecc800424aa07f4e03498efff4f4eeb9b6118/jvmci/jdk.vm.ci.code/src/jdk/vm/ci/code/MemoryBarriers.java#L110-L113

And we have a mapping of Unsafe fences here:
https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/19fb8f93c59dfd791f62d41f332db9e306bc1422/src/java.base/share/classes/jdk/internal/misc/Unsafe.java#L3275-L3315

However, I'm not sure modelling volatile accesses with fences is quite correct (e.g., https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#myth-barriers-are-sane). But probably our best option in this case to still avoid boxing.

@chrisseaton
Copy link
Collaborator Author

The object model uses Unsafe to read and write storage locations. Is there a volatile equivalent that the object model could use for us? And actually how does the use of Unsafe match up with scalar replacement of aggregates and other optimisations? How does it stop it being bypassed as an uncontrolled unsafe action?

@eregon
Copy link
Member

eregon commented Feb 16, 2021

Is there a volatile equivalent that the object model could use for us?

Yes, there is put*Volatile. But AFAIK there is no way to make the object model use that currently. cc @woess

And actually how does the use of Unsafe match up with scalar replacement of aggregates and other optimisations? How does it stop it being bypassed as an uncontrolled unsafe action?

If the DynamicObject is never allocated then it doesn't matter, nothing else can access those fields anyway.
If it's EA'd and then later on allocated, there is some STORE_STORE barrier, so anyone reading from that object should correctly see all initial values (safe initialization).

@chrisseaton
Copy link
Collaborator Author

chrisseaton commented Feb 16, 2021

Graal canonicalises the RawRead back to a LoadField - that's what I was confused about.

@eregon
Copy link
Member

eregon commented Feb 16, 2021

We have another issue: class variables might be mutated/accessed concurrently.
Before, this was handled by the ConcurrentHashMap, but DynamicObject alone is not thread-safe.

We should synchronize writes, whenever the RubyModule is shared.
In practice, most modules/classes end up being shared between threads (at least when there is a second Ruby Thread) because they are referenced by constants.
So when the RubyModule gets shared, so should the ClassVariableStorage, and this can happen simply in ModuleFields#getAdjacentObjects.

We also need write barriers like it's done in ModuleOperations#setClassVariable. PropagateSharingNode can be used for that conveniently on the fast path.

And all write accesses need to either use WriteObjectFieldNode or synchronized if ClassVariableStorage is shared.

Copy link
Member

@eregon eregon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, I'll take care of the synchronization part.

@eregon eregon added the in-ci The PR is being tested in CI. Do not push new commits. label Feb 17, 2021
@eregon eregon added this to the 21.1.0 milestone Feb 17, 2021
@eregon eregon self-assigned this Feb 17, 2021
graalvmbot pushed a commit that referenced this pull request Feb 17, 2021
@graalvmbot graalvmbot merged commit b34d04d into oracle:master Feb 17, 2021
@chrisseaton chrisseaton deleted the class-variables branch February 17, 2021 15:30
@chrisseaton
Copy link
Collaborator Author

chrisseaton commented Feb 21, 2021

Summing up after @eregon's fixes for synchronisation.

Reading class variables is still fine - it gets a barrier but these only stop multiple reads collapsing and so in our example they don't even change the machine code.

Writing class variables does add a lot of extra work for the synchronisation. Seems ok as it goes away in the single-threaded case.

example_write-final-multi

				;Comment 64:	[stack overflow check]
				;Comment 64:	VERIFIED_ENTRY
	0x1296e8560:	mov	dword ptr [rsp - 0x14000], eax
	0x1296e8567:	sub	rsp, 0x28
				;Comment 75:	block B0 null
				;Comment 75:	0 [rsi|QWORD[.], rdx|QWORD[.], rbp|QWORD] = LABEL numbPhis: 0 align: false label: ?
				;Comment 75:	2 stack:16|QWORD = MOVE rbp|QWORD moveKind: QWORD
				;Comment 75:	FRAME_COMPLETE
	0x1296e856b:	mov	qword ptr [rsp + 0x20], rbp
				;Comment 80:	4 [stack:32|QWORD] = HOTSPOTLOCKSTACK frameMapBuilder: org.graalvm.compiler.lir.amd64.AMD64FrameMapBuilder@29318860 slotKind: QWORD
				;Comment 80:	6 r11|QWORD[.] = HOTSPOTLOADOBJECTCONSTANT input: Object[ClassVariableStorage@1603407373]
				;Comment 80:	{Object[ClassVariableStorage@1603407373]}
	0x1296e8570:	movabs	r11, 0x616e938a0
	0x1296e857a:	nop	word ptr [rax + rax]
				;Comment 96:	8 CMPCONSTBRANCH [r11|QWORD[.] + 12] trueDestinationProbability: 1.0 condition: = trueDestination: B0 -> B1 falseDestination: B0 -> B52 y: -559030611 size: DWORD inlinedY: NarrowOop[ShapeBasic@1296236027]
				;Comment 96:	{NarrowOop[ShapeBasic@1296236027]}
	0x1296e8580:	cmp	dword ptr [r11 + 0xc], 0xc2da979a
	0x1296e8588:	jne	0x1296e88d5
				;Comment 110:	block B1 null
				;Comment 110:	10 [] = LABEL numbPhis: 0 align: false label: ?
				;Comment 110:	12 rax|DWORD[.] = MOV [rdx|QWORD[.] + 44] size: DWORD
	0x1296e858e:	mov	eax, dword ptr [rdx + 0x2c]
				;Comment 113:	14 CMPCONSTBRANCH [rax|DWORD[.] * 8 + 12] trueDestinationProbability: 1.0 condition: = trueDestination: B1 -> B2 falseDestination: B1 -> B51 y: 14 size: DWORD inlinedY: null
	0x1296e8591:	cmp	dword ptr [rax*8 + 0xc], 0xe
	0x1296e8599:	jne	0x1296e8926
				;Comment 127:	block B2 null
				;Comment 127:	16 [] = LABEL numbPhis: 0 align: false label: ?
				;Comment 127:	18 rsi|DWORD[.] = MOV [rdx|QWORD[.] + 16] size: DWORD
	0x1296e859f:	mov	esi, dword ptr [rdx + 0x10]
				;Comment 130:	20 r10|DWORD[.] = MOV [rdx|QWORD[.] + 20] size: DWORD
	0x1296e85a2:	mov	r10d, dword ptr [rdx + 0x14]
				;Comment 134:	22 rbp|DWORD[.] = MOV [rdx|QWORD[.] + 24] size: DWORD
	0x1296e85a6:	mov	ebp, dword ptr [rdx + 0x18]
				;Comment 137:	24 r8|DWORD[.] = MOV [rdx|QWORD[.] + 28] size: DWORD
	0x1296e85a9:	mov	r8d, dword ptr [rdx + 0x1c]
				;Comment 141:	26 r9|DWORD[.] = MOV [rdx|QWORD[.] + 32] size: DWORD
	0x1296e85ad:	mov	r9d, dword ptr [rdx + 0x20]
				;Comment 145:	28 rcx|DWORD[.] = MOV [rdx|QWORD[.] + 36] size: DWORD
	0x1296e85b1:	mov	ecx, dword ptr [rdx + 0x24]
				;Comment 148:	30 rdx|DWORD[.] = MOV [rdx|QWORD[.] + 40] size: DWORD
	0x1296e85b4:	mov	edx, dword ptr [rdx + 0x28]
				;Comment 151:	32 rbx|QWORD = MOV [r11|QWORD[.]] size: QWORD
	0x1296e85b7:	mov	rbx, qword ptr [r11]
				;Comment 154:	34 rdi|QWORD = STACKLEA stack:32|QWORD
	0x1296e85ba:	lea	rdi, [rsp + 0x10]
				;Comment 159:	36 r13|QWORD = HOTSPOTLOADMETASPACECONSTANT input: meta{HotSpotType<Lorg/truffleruby/language/objects/classvariables/ClassVariableStorage;, resolved>}
				;Comment 159:	{meta{HotSpotType<Lorg/truffleruby/language/objects/classvariables/ClassVariableStorage;, resolved>}}
	0x1296e85bf:	movabs	r13, 0x8402c8720
				;Comment 169:	38 r13|QWORD = MOV [r13|QWORD + 184] size: QWORD
	0x1296e85c9:	mov	r13, qword ptr [r13 + 0xb8]
				;Comment 176:	40 r14|QWORD = OR (x: r15|QWORD, y: r13|QWORD) size: QWORD
	0x1296e85d0:	mov	r14, r15
	0x1296e85d3:	or	r14, r13
				;Comment 182:	42 stack:40|QWORD = MOVE r13|QWORD moveKind: QWORD
	0x1296e85d6:	mov	qword ptr [rsp + 8], r13
				;Comment 187:	44 r13|QWORD = XOR (x: r14|QWORD, y: rbx|QWORD) size: QWORD
	0x1296e85db:	mov	r13, r14
	0x1296e85de:	xor	r13, rbx
				;Comment 193:	46 rsi|QWORD[.] = UNCOMPRESSPOINTER (input: rsi|DWORD[.], ~baseRegister: r12|ILLEGAL) nonNull: false lirKindTool: org.graalvm.compiler.hotspot.amd64.AMD64HotSpotLIRKindTool@27106c4d encoding: base: 0 shift: 3
	0x1296e85e1:	shl	rsi, 3
				;Comment 197:	48 r10|QWORD[.] = UNCOMPRESSPOINTER (input: r10|DWORD[.], ~baseRegister: r12|ILLEGAL) nonNull: false lirKindTool: org.graalvm.compiler.hotspot.amd64.AMD64HotSpotLIRKindTool@27106c4d encoding: base: 0 shift: 3
	0x1296e85e5:	shl	r10, 3
				;Comment 201:	50 rbp|QWORD[.] = UNCOMPRESSPOINTER (input: rbp|DWORD[.], ~baseRegister: r12|ILLEGAL) nonNull: false lirKindTool: org.graalvm.compiler.hotspot.amd64.AMD64HotSpotLIRKindTool@27106c4d encoding: base: 0 shift: 3
	0x1296e85e9:	shl	rbp, 3
				;Comment 205:	52 r8|QWORD[.] = UNCOMPRESSPOINTER (input: r8|DWORD[.], ~baseRegister: r12|ILLEGAL) nonNull: false lirKindTool: org.graalvm.compiler.hotspot.amd64.AMD64HotSpotLIRKindTool@27106c4d encoding: base: 0 shift: 3
	0x1296e85ed:	shl	r8, 3
				;Comment 209:	54 r9|QWORD[.] = UNCOMPRESSPOINTER (input: r9|DWORD[.], ~baseRegister: r12|ILLEGAL) nonNull: false lirKindTool: org.graalvm.compiler.hotspot.amd64.AMD64HotSpotLIRKindTool@27106c4d encoding: base: 0 shift: 3
	0x1296e85f1:	shl	r9, 3
				;Comment 213:	56 rcx|QWORD[.] = UNCOMPRESSPOINTER (input: rcx|DWORD[.], ~baseRegister: r12|ILLEGAL) nonNull: false lirKindTool: org.graalvm.compiler.hotspot.amd64.AMD64HotSpotLIRKindTool@27106c4d encoding: base: 0 shift: 3
	0x1296e85f5:	shl	rcx, 3
				;Comment 217:	58 rdx|QWORD[.] = UNCOMPRESSPOINTER (input: rdx|DWORD[.], ~baseRegister: r12|ILLEGAL) nonNull: false lirKindTool: org.graalvm.compiler.hotspot.amd64.AMD64HotSpotLIRKindTool@27106c4d encoding: base: 0 shift: 3
	0x1296e85f9:	shl	rdx, 3
				;Comment 221:	60 rax|QWORD[.] = UNCOMPRESSPOINTER (input: rax|DWORD[.], ~baseRegister: r12|ILLEGAL) nonNull: false lirKindTool: org.graalvm.compiler.hotspot.amd64.AMD64HotSpotLIRKindTool@27106c4d encoding: base: 0 shift: 3
	0x1296e85fd:	shl	rax, 3
				;Comment 225:	62 TESTCONSTBRANCH r13|QWORD trueDestinationProbability: 0.99 condition: = trueDestination: B2 -> B28 falseDestination: B2 -> B4 y: -121 size: QWORD
	0x1296e8601:	test	r13, -0x79
	0x1296e8608:	jne	0x1296e86b4
				;Comment 238:	block B28 null
				;Comment 238:	294 [] = LABEL numbPhis: 0 align: false label: ?
				;Comment 238:	296 CMPCONSTBRANCH [r11|QWORD[.] + 12] trueDestinationProbability: 1.0 condition: = trueDestination: B28 -> B29 falseDestination: B28 -> B50 y: -559030611 size: DWORD inlinedY: NarrowOop[ShapeBasic@1296236027]
				;Comment 238:	{NarrowOop[ShapeBasic@1296236027]}
	0x1296e860e:	cmp	dword ptr [r11 + 0xc], 0xc2da979a
	0x1296e8616:	jne	0x1296e88f3
				;Comment 252:	block B29 null
				;Comment 252:	298 [] = LABEL numbPhis: 0 align: false label: ?
				;Comment 252:	300 MOV [r11|QWORD[.] + 24] y: 14 size: QWORD
	0x1296e861c:	mov	qword ptr [r11 + 0x18], 0xe
				;Comment 260:	302 r10|QWORD = MOV [r11|QWORD[.]] size: QWORD
	0x1296e8624:	mov	r10, qword ptr [r11]
				;Comment 263:	304 rax|QWORD = AND r10|QWORD y: 7 size: QWORD
	0x1296e8627:	mov	rax, r10
	0x1296e862a:	and	rax, 7
				;Comment 270:	306 r8|QWORD[.] = HOTSPOTLOADOBJECTCONSTANT input: Object[14]
				;Comment 270:	{Object[14]}
	0x1296e862e:	movabs	r8, 0x7bfb4a8d8
	0x1296e8638:	nop	dword ptr [rax + rax]
				;Comment 288:	308 CMPCONSTBRANCH rax|QWORD trueDestinationProbability: 0.9 condition: = trueDestination: B29 -> B30 falseDestination: B29 -> B31 y: 5 size: QWORD inlinedY: null
	0x1296e8640:	cmp	rax, 5
	0x1296e8644:	jne	0x1296e8663
				;Comment 298:	block B30 null
				;Comment 298:	310 [] = LABEL numbPhis: 0 align: false label: ?
				;Comment 298:	312 rax|QWORD[.] = MOVE r8|QWORD[.] moveKind: QWORD
	0x1296e864a:	mov	rax, r8
				;Comment 301:	314 RETURN (savedRbp: stack:16|QWORD, value: rax|QWORD[.]) isStub: false requiresReservedStackAccessCheck: false thread: r15 scratchForSafepointOnReturn: rcx config: org.graalvm.compiler.hotspot.GraalHotSpotVMConfig@31053be4
	0x1296e864d:	mov	rbp, qword ptr [rsp + 0x20]
	0x1296e8652:	add	rsp, 0x28
	0x1296e8656:	mov	rcx, qword ptr [r15 + 0x108]
				;Comment 317:	POLL_RETURN_FAR
	0x1296e865d:	test	dword ptr [rcx], eax
	0x1296e865f:	vzeroupper	
	0x1296e8662:	ret	

@headius
Copy link
Contributor

headius commented Mar 31, 2021

FYI since I am also looking into this...

The JRuby instance variable numbers are greatly skewed in the original benchmark since we do not (yet) inline through the proc.call from BIPS. With the loop moved into the benchmark and run on an equivalent JIT (Graal JIT on Java 15 in this case) the numbers look quite different:

JRuby:

 ivar_read      1.289B (± 4.6%) i/s -      6.531B in   5.076854s

vs MRI:

ivar_read     31.252M (± 8.2%) i/s -    155.814M in   5.027783s

vs TruffleRuby:

 ivar_read      736.681M (±20.9%) i/s -      3.350B in   5.012836s

Using BIPS as in the original benchmark (very small bodies of code, iteration outside of the block) skews this comparison tremendously, since TR optimizes the outer loop along with the benchmark bodies and the other impls do not.

Of course the class var numbers on JRuby do not change substantially with the modified benchmark, but we do no optimization of class variables yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in-ci The PR is being tested in CI. Do not push new commits. oca-signed performance shopify Pull requests from Shopify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants