Description
tl;dr
Surge currently provides separate implementations for each function for Float
and Double
, respectively. This makes Surge basically incompatible with Swift's T: FloatingPoint
generics. By introducing a little bit of internal runtime dynamism we aim to migrate existing function pairs to their generic equivalent for T: FloatingPoint
.
What?
With the recent refactors we have managed to reduce the implementations of each computation into a function set consisting of a single internal
core-implementation, acting as a single source of truth, and a bunch of thin public
convenience wrapper functions.
Scalar-Division ([Scalar] / Scalar
) is implemented like this:
public func / <L>(lhs: L, rhs: Float) -> [Float] where L: UnsafeMemoryAccessible, L.Element == Float {
return div(lhs, rhs)
}
public func div<L>(_ lhs: L, _ rhs: Float) -> [Float] where L: UnsafeMemoryAccessible, L.Element == Float {
return withArray(from: lhs) { divInPlace(&$0, rhs) }
}
func divInPlace<L>(_ lhs: inout L, _ rhs: Float) where L: UnsafeMutableMemoryAccessible, L.Element == Float {
lhs.withUnsafeMutableMemory { lm in
var scalar = rhs
vDSP_vsdiv(lm.pointer, numericCast(lm.stride), &scalar, lm.pointer, numericCast(lm.stride), numericCast(lm.count))
}
}
… with an almost identical copy existing for each of these functions for Double
, instead of Float
.
Why?
While the project's current state is quite an improvement over its previous state it has a couple of remaining deficits:
- We have literally everything in two near-identical flavors:
Float
andDouble
. - One cannot currently use Surge in contexts where one is using
T: FloatingPoint
overFloat
/Double
.
So this got me thinking: What if we migrated Surge from using Float
/Double
to an API with T: FloatingPoint
and then internally make use of some dynamic language features to roll our own polymorphism over the closed set of Float
and Double
with a fatalError(…)
on type-mismatch?
Aforementioned dynamism would add a certain amount of runtime overhead to Surge. It is important to note however that we would be adding a constant overhead (O(1)
vs. O(N)
), as a single call of Surge.divInPlace(_:_:)
over a pair of 10_000
-element arrays only adds a single branch per execution, not 10_000
branches in a loop, as would be the case for a naïve non-parallel looping implementation.
How?
So how would this look like? What would we need to change?
- We would replace every existing pair of thin
public
wrapper functions forFloat
/Double
with a single equivalent function that is generic overT: FloatingPoint
, instead. - We would merge every existing pair of
internal
…InPlace(…)
core-implementation functions forFloat
/Double
into a single equivalent function that is generic overT: FloatingPoint
on the outside and then performs aswitch
onT.self
on the inside, instead. - We would add
func withMemoryRebound(to:_:)
toUnsafeMemory<T>
andUnsafeMutableMemory<T>
, so that we can efficiently cast fromUnsafeMemory<T: FloatingPoint>
toUnsafeMemory<Double>
, without having to copy/cast any individual values. - We would add
func withUnsafeMemory(as:…)
convenience functions for retrieving type-cast variants ofUnsafeMemory<T>
from instances ofUnsafeMemoryAccessible
/UnsafeMutableMemoryAccessible
. - We would refactor the
func …InPlace(…)
implementations into something like this:
func divInPlace<L, T>(_ lhs: inout L, _ rhs: T) where L: UnsafeMutableMemoryAccessible, L.Element == T, T: FloatingPoint & ExpressibleByFloatLiteral {
let rhs = CollectionOfOne(rhs)
withUnsafeMemory(
&lhs,
rhs,
float: { lhs, rhs in
vDSP_vsdiv(lhs.pointer, numericCast(lhs.stride), rhs.pointer, lhs.pointer, numericCast(lhs.stride), numericCast(lhs.count))
},
double: { lhs, rhs in
vDSP_vsdivD(lhs.pointer, numericCast(lhs.stride), rhs.pointer, lhs.pointer, numericCast(lhs.stride), numericCast(lhs.count))
}
)
}
So far I have not been able to measure any noticeable performance regressions introduced by this change.
There also should be very little breakage from the changes, as T: FloatingPoint
is for the most part a strict superset of either Float
or Double
.
(I already have a proof-of-concept for this on a local branch and will push it as a PR at some point.)