Skip to content

Implementation notes

jdanbrown edited this page Oct 13, 2011 · 7 revisions

This page is about the implementation details of scalala. You don't need to know any of this to be able to use scalala. Read on if you are a scalala maintainer, you are extending scalala with your own datatypes, or if you are one of those people who is nosy about code.

Specialization

For scalala to run with acceptable performance, it must where ever possible avoid boxing and make use of raw arrays for storage. Scala provides the @specialized mechanism to help achieve this. However, @specialized needs treating with respect and using systematically to make it work well.

Avoid unqualified @specialized

The @specialized annotation accepts a list of types for which it will generate specialized code and types. If you leave it out, the compiler will generate an ungodly number of versions of the class or method. The scalala codebase is already big enough without unnecessary bloat. Please follow these simple rules:

  • Without special dispensation, never annotate a type with @specialized unless you supply the list of types for it to work with.
  • For key types, use @specialized(Int, Long)
  • For value types, use @specialized(Int, Long, Float, Double). A small number of lower-level types use @specialized(Int, Long, Float, Double, Boolean) for the value type, but this should not be necessary in your code.

Chain @specialized

Ensure all type parameters that end up being used as parameters annotated with @specialized are themselves annotated with the same type list. For the compiler to be able to use the specialized operation or constructor, it must:

  • Know the concrete type at the call-site, or
  • Use a type parameter that is itself @specialized to a corresponding type list
  • Never specialize on more types than those supported by the type you're chaining to
// good - returns the DenseMatrix type specialized for Int
def mkIntMatrix() = DenseMatrix(1,2,3,4)

// bad - returns a DenseMatrix storing objects
def mkDenseMatrix[V: Scalar] = DenseMatrix.zeros(10, 10)

// good - @specialized chaining makes sure that the optimized sub-type is instantiated
def mkDenseMatrix[@specialized(Int, Long, Float, Double) V: Scalar] = DenseMatrix.zeros(10, 10)

// some function with specialized type K
def foo[@specialized(Int, Long) K](k: K): K = k
// wasted specialization on Double
def bar[@specialized(Int, Long, Double) K(k: K): K = foo(k)

As a concrete example, here is real code from the scalala source:

def zeros[V](rows: Int, cols: Int)(implicit s: Scalar[V]) = {
  implicit val mf = implicitly[Scalar[V]].manifest;
  implicit val dv = implicitly[Scalar[V]].defaultArrayValue;

  val data = new Array[SparseVectorCol[V]](cols)
  var c = 0;
  while(c < cols) {
    data(c) = SparseVectorCol.zeros(rows)
    c = c + 1
  }
  new CCSMatrix[V](rows, cols, data)
}

This typechecks and runs. CCSMatrix.zeros creates some SparseVectorCol instances and a CCSMatrix, both of which are parameterised over V. At the time this code is compiled, it doesn't know the concrete type of V, so it has no choice but to instantiate SparseVectorCol and CCSMatrix for general objects, even if V happens to be a type for which these classes are specialised. To fix this, we need to ensure that the zeros method itself is specialized for V.

  def zeros[@specialized(Int, Long, Float, Double) V](rows: Int, cols: Int)(implicit s: Scalar[V]) = {
    implicit val mf = implicitly[Scalar[V]].manifest;
    implicit val dv = implicitly[Scalar[V]].defaultArrayValue;

    val data = new Array[SparseVectorCol[V]](cols)
    var c = 0;
    while(c < cols) {
      data(c) = SparseVectorCol.zeros(rows)
      c = c + 1
    }
    new CCSMatrix[V](rows, cols, data)
  }

Now the compiler will generate one version of this method for general objects (as before) and each of Int, Long, Float, Double. So, if you call CCSMatrix.zeros in a context that knows V is one of these four primitive types, it will use the version that is specialized for that, and in turn, produce SparseVectorCol and CCSMatrix instances specialized to that primitive type.

Implicits threading

Scalala makes heavy use of a type-class-centric design, where most polymorphic operations are represented as strategies rather than as methods. For example, when multiplying two matrices, the actual operation is delegated to an instance of type BinaryOp[M1 <: Matrix[V], M2 <: Matrix[V], OpMatrixMult, M3 <: Matrix[V]]. This allows various implementations of this operation optimized for various matrix representations to be dropped in. For example, multiplying two sparse matrices may be implemented differently from that optimized for a dense row-major with a dense column-major matrix.

There are type-classes for most aspects of matrix, vector and scalar manipulation. They are structured so that there is always a fall-back implementation. However, performance is gained by using very specific implementations. To avoid users needing to know what strategies to use where, scalala uses the scala implicit resolution mechanism to do much of the hard work for you. This means that for simple use, scalala "just works". However, for more advanced work, you must be aware of implicits, and take this into account.

Implicit resolution

Scala resolves implicits using the statically available reference types. If a value has static type A, even though at run-time it will be an instance of B, then the implicits will be resolved against A. Consider this contrived example:

class Foo
class Bar extends Foo

implicit val fooToString = (f: Foo) => "Got a Foo"
implicit val barToString = (f: Foo) => "Got a Bar"

def withToString[A](a: A)(implicitly asString: A => String): String = asString(s)

val a : Foo = new Bar
withToString(a) // prints "Got a Foo", not "Got a Bar"

This normally does minimal harm, as people tend to leave type annotations off scalac is very good at inferring types. However, when you throw in generics, this can lead to unexpected pain.

def fa[A](a: A <: Foo) = withToString(a)
withToString(new Bar) // prints "Got a Foo"

In this example, fa chains to withToString. The problem is that inside fa, the compiler only knows that a is an instance of Foo. So, the call to withToString expands to withToString(a)(fooToString). When we invoke it as fa[Bar], there's no way to propagate that extra knowledge to the implict. To do so, you would need to explicitly chain the implicit through fa:

def fa[A](a: A <: Foo)(implicitly aToString: A => String) = withToString(a)
withToString(new Bar) // prints "Got a Bar"

Now, scalac can expand the implicits to give:

def fa[A](a: A <: Foo)(implicitly aToString: A => String) = withToString(a)(aToString)
withToString(new Bar)(barToString) // prints "Got a Bar"

The benefit of using implicits and type-classes is the degree of flexibility it affords, independent of the original objects. The associated cost is that you need to take responsibility for chaining implicits where that makes sense.

Threading Implicits for Matrix Operations

Here's a simple function that will compute the sum of the 0th column of a matrix.

def sumZeroColumn[M <: Matrix[Double]](m: M) = m(::, 0).sum

// apply this to a dense matrix
val dm = DenseMatrix((1,2), (3,4))
val sum = sumZeroColumn(dm) // 4

The code compiles and runs, but it may not be doing quite what you expect. The call to m(::, 0) takes implicit parameters that depend upon the type of the matrix. In this context, all the compiler knows is that M <: Matrix[Double], so it finds an implicit that satisfies this. However, we are calling it with a DoubleMatrix that supplies its own implicit that more efficiently implements the column slice. To allow sumZeroColumn to use this efficient operation we need to make it aware of it.

def sumZeroColumn[M <: Matrix[Double], Col <: Tensor1[Int, Double]]
                 (m: M)
                 (implicit colSlice: CanSliceCol[MatrixDouble, Int, Int, Double, Col] =
   colSlice(m, 0).sum

Here we've passed in colSlice as an implicit parameter and used this to perform the slicing operation. This allows us to capture the operation from the call-site, where the type of M may be known more exactly. Now when we call sumZeroColumn(dm), the compiler will look up a value for colSlice that is optimal for DenseMatrix. We also had to add an additional type parameter, Col, to the signature as we don't know up-front what the type of the column returned by the slice will be. We could have simply used Tensor1[Int, Double] in-line, but this form is preferred because a) it's more explicit, and b) it allows us to use Col in the search for other implicits e.g. those supplying division or multiplication of the column by some value.

Implicit resolution proceeds from left to right

If there are several implicit parameters, scalac satisfies them from left to right, and in each position finds the single most specific implicit value that it can. This means that it is very important to put things in the right order if the resolution of one implicit should affect the resolution of another.

Here is a signature from some real code that uses scalala.

  def apply[M <: Matrix[Double], CV1 <: CV2, CV2 <: Vector[Double]]
  (m: M)
  (implicit
   csc: CanSliceCol[M, Int, Int, Double, Int, CV1],
   divVS: BinaryOp[CV2, Double, OpDiv, CV2],
   mulMS: BinaryOp[M, Double, OpMulMatrixBy, M],
   mulMM: BinaryOp[M, M, OpMulMatrixBy, M],
   subMM: BinaryOp[M, M, OpSub, M],
   cmvM: CanMapValues[M, Double, Double, M])
  : Stream[(Double, M)]

This apply method returns a stream of matrices together with an error value, allowing the client to take from the stream until the error value is small enough. You can see that it passes in a bewildering array of implicit values. The ordering is critical. CV1 is the type resulting from slicing a column from the matrix. CV2 is a type supported by division of a vector by a Double. If the first and second implicit where swapped, then CV2 would be resolved first by the most-specific implicit that was able to provide division of any kind of Vector[Double]. This would almost certainly not be an operation that would be suitable for the very specific kind of vector representing the column sliced from our matrix, so at that point the compiler would produce an error.

The issue arises because neither CV1 nor CV2 are directly mentioned in the parameters or return value. They are "floating" constraints that will be resolved during implicit resolution. The rule of thumb is to order the implicit parameters so that "floating" constraints are anchored to known types as early as possible. In this example, csc anchors CV1 to M. In contrast, divVS depends upon CV1 (through the CV1 <: CV2 constraint), so we should resolve divVS only after CV1 has already been picked.

The first time I tried to implement this method, I actually had a single type variable CV used in both places. For some calls to the function the implicit resolution failed. This was because it was searching for a CanSliceCol[...] and BinaryOp[... OpDiv ...] with exactly matching types. This is overly strict - we only need the resulting column to be usable as input to the division operation, and by the normal rules of OOP, that means it can be a sub-type.

Clone this wiki locally