Description
Reproduction steps
Consider the following jmh benchmark (on scala 2.13.8 and hotspot openjdk-19, but I believe this applies to all versions except that graal is possibly better than C2 at loop hoisting the checks):
//effectively equivalent to scala standard library array iterator
class AnyIter[@specialized A](a: Array[A]) extends Iterator[A] {
private[this] var idx = 0
private[this] val len = a.length
override def hasNext: Boolean = idx < len
override def next(): A = { val r = a(idx); idx += 1; r }
}
class AnyRefIter[A <: AnyRef](a: Array[A]) extends Iterator[A] {
private[this] var idx = 0
private[this] val len = a.length
override def hasNext: Boolean = idx < len
override def next(): A = {
val r = a(idx); idx += 1; r
}
}
@State(Scope.Benchmark)
@OperationsPerInvocation(1000)
class ArrayOps {
val items = Range(1, 1000).iterator.map { idx => s"${idx}" }.toArray
@Benchmark
def scalaIter(bh: Blackhole): Unit = {
val iter = items.iterator.filter { _ != "aaa" }
bh.consume(iter)
for (item <- iter) bh.consume(item)
}
@Benchmark
def anyIter(bh: Blackhole): Unit = {
val iter = new AnyIter(items).filter { _ != "aaa" }
bh.consume(iter)
for (item <- iter) bh.consume(item)
}
@Benchmark
def anyRefIter(bh: Blackhole): Unit = {
val iter = new AnyRefIter(items).filter { _ != "aaa" }
bh.consume(iter)
for (item <- iter) bh.consume(item)
}
}
This gives with LinuxPerfNorm profiler enabled
Benchmark Mode Cnt Score Error Units
joernBench.ArrayOps.anyIter avgt 21 5.307 ± 0.045 ns/op
joernBench.ArrayOps.anyIter:instructions:u avgt 3 115.353 ± 10.216 #/op
joernBench.ArrayOps.anyRefIter avgt 21 4.689 ± 0.034 ns/op
joernBench.ArrayOps.anyRefIter:instructions:u avgt 3 104.056 ± 25.334 #/op
joernBench.ArrayOps.scalaIter avgt 21 5.397 ± 0.119 ns/op
joernBench.ArrayOps.scalaIter:instructions:u avgt 3 116.199 ± 21.201 #/op
(the confidence intervals reported by jmh are overly wide in many cases, because it assumes the quite conservative student t distribution. In other words, one needs to very carefully design test parameters with high iteration and fork counts in order to get good confidence intervals, and I did not do that here)
Problem
The problem is that specialization conflates AnyRef
with Any
. In many cases, there is not a relevant distinction in terms of performance whether we statically know that a type T
is T <: AnyRef
.
For accessing an Array[T]
this makes a big difference, though: Accessing the array is aload
if T <: AnyRef
, and is a much slower scala.runtime.ScalaRunTime.array_apply(xs: AnyRef, idx: Int): Any
.
Ideally, specialization would stop conflating Any and AnyRef.
If that is difficult for technical reasons, then we need to very prominently document this and warn people away from naive use of @specialized
. Instead we need to point people to using custom specialization, as the standard library already does in some places like scala.collection.immutable.ArraySeq$ofRef
.
Furthermore, we need to take our own bitter medicine and manually specialize performance critical parts. It is hard to imagine more critical code than iteration over arrays, so we definitely need a class scala.collection.ArrayOps$ArrayIteratorOfRef
. But we need to look at all uses of @specialized
and consider manually specializing the AnyRef case.