-
Notifications
You must be signed in to change notification settings - Fork 149
Exploring structural generic programming and layer APIs #613
Conversation
This change is a first attempt at leveraging structural generic programming to implement some sugar for a higher-level API.
CC @shadaj @dabrahams @shabalind @BradLarson @pschuh @compnerd for state. |
I am really excited to see the idea of building Differerentiable on top of Structural panning out quite well in practice! |
public var conv = Conv2D<Float>(filterShape: (5, 5, 3, 6)) | ||
public var pool = MaxPool2D<Float>(poolSize: (2, 2), strides: (2, 2)) | ||
public var flatten = Flatten<Float>() | ||
@ResidualConnection var denseSkipped = Dense<Float>(inputSize: 36 * 6, outputSize: 36 * 6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would we express models where two different transformations need to be applied to the same data? As far as I understand, ResidualConnection
assumes that one of the transformations will always be the identity function, but for example in ResNet this may not be the case if a projection is needed (https://github.com/tensorflow/swift-models/blob/master/Models/ImageClassification/ResNet.swift#L92).
It also feels a bit awkward to write this as part of a sequential layer since it's more something parallel. I wonder if there is some way we can use properties for the vertical axis and something else for a horizontal axis of parallel layers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there's definitely other ways to spell this. One thing we could do is instead of using property wrappers, we could instead have something like:
struct ParallelLayers<Lhs: Layer, Rhs: Layer>: Layer where Lhs.Input == Rhs.Input, ... {
var lhs: Lhs
var rhs: Rhs
// TODO: make merge func configurable.
@differentiable
public func callAsFunction(_ input: Lhs.Input) -> Lhs.Output {
return lhs(input) + rhs(input)
}
}
which could represent the parallelism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could work really well! It's interesting to see this combination of using both structural and regular functions/structs to compose layers.
Another (slightly crazier) idea is to offer a structural ParallelLayer
where all the properties are executed on the same input. From what I see, the biggest advantage of the structural approach is you replace the keyed access to layers with regular property accesses since every element in the sequence has a label (the property name). Though I'm not sure how the user would define the merge function in this case since there aren't a fixed number of parallel layers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that's definitely another potentially interesting point in design space.
Perhaps you just require users to define a pair-wise reduction function. Alternatively, perhaps a fold
function and initial value could be provided...
public var conv = Conv2D<Float>(filterShape: (5, 5, 3, 6)) | ||
public var pool = MaxPool2D<Float>(poolSize: (2, 2), strides: (2, 2)) | ||
public var flatten = Flatten<Float>() | ||
@SequentialSkip(passing: Type<Tensor<Float>>()) var denseSkipped = Dense<Float>(inputSize: 1, outputSize: 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so passing
is the type of the previous and following layer? I guess it can't be inferred since we're using properties.
This PR represents a proposal to change the way Swift types are encoded into their structural representation to enable more flexible applications of structural generic programming. This proposal was inspired by [a structural programming-based deep learning model composition API](tensorflow/swift-models#613). > tl;dr: Drop the `StructuralEmpty` from the end of the field list. Today, we represent a `struct` as a "Cons-list" of its fields, terminated by `StructuralEmpty`. For example: ```swift struct Point2: Structural { var x: Int var y: Float } ``` would have the following `StructuralRepresentation` associated type generated by the compiler: ```swift extension Point2 { public typealias StructuralRepresentation = StructuralStruct< StructuralCons<StructuralProperty<Int>, StructuralCons<StructuralProperty<Float>, StructuralEmpty>>> // ... } ``` or, alternatively written with a made-up syntax loosely inspired by Scala's `HList` type: ```swift typealias StructuralRepresentation = StructuralStruct< StructuralProperty<Int> :: StructuralProperty<Float> :: StructuralEmpty> ``` This proposal suggests we modify the representation to look as follows: ```swift extension Point2 { public typealias StructuralRepresentation = StructuralStruct< StructuralCons<StructuralProperty<Int>, StructuralCons<StructuralProperty<Float>>>> // ... } ``` or in the made-up syntax: ``` typealias StructuralRepresentation = StructuralStruct< StructuralProperty<Int> :: StructuralProperty<Float>> ``` The advantage of such a shift in representation are three fold: 1. **Simplifies common inductive cases**: when providing a structural generic programming-based automatic conformance, an extension for `StructuralEmpty` is required. In the examples in this repository, most of them are benign empty implementations (e.g. [`DecodeJSON`](https://github.com/google/swift-structural/blob/043713b88913efe79bc5041af5ba3de3c1d74517/Sources/StructuralExamples/DecodeJSON.swift#L61), [`DefaultInitializable`](https://github.com/google/swift-structural/blob/043713b88913efe79bc5041af5ba3de3c1d74517/Sources/StructuralExamples/DefaultInitializable.swift#L48), [`ScaleBy`](https://github.com/google/swift-structural/blob/043713b88913efe79bc5041af5ba3de3c1d74517/Sources/StructuralExamples/ScaleBy.swift#L85)), however some of them require more careful thought (e.g. [`CustomComparable`](https://github.com/google/swift-structural/blob/043713b88913efe79bc5041af5ba3de3c1d74517/Sources/StructuralExamples/CustomComparable.swift#L102)). With this change, only conformances that would like to support zero-sized types are required to provide a conformance for `StructuralEmpty`. (See below.) 2. **Special handling for zero-sized types.** Some protocols might want to have distinct handling for zero-sized types. For example, when encoding JSON, different applications might want to alternatively serialize or not serialize the field. (For example, a sentinal marker field to declare a version.) If this proposal is adopted, the JSON library could drop the `StructuralEmpty` conformance to `StructuralEmpty`, which would automatically remove automatic `EncodeJSON` conformance for zero- sized types. 3. **Composition of computation based on fields.** The included motivating example is a way of preserving static information when performing lazy operations to sequences. In Swift today, escaping closures are heap allocated and type erased, which effectively forms an optimization boundary. Additionally, they operate with reference semantics and cannot have any additional properties / fields (unlike classes or structs) and thus cannot have additional entry points to access the state. By representing the transformation not as a closure, but instead as a `struct`, we can realize better performance, and build additonal entry points to view and manipulate the stored state. This PR includes a _woefully incomplete_ step in this direction. See the tests for the usage, and the benchmark numbers below for a comparison of performance. (tl;dr: At small sizes, the existing lazy transforms are equivalent or faster; at large sizes, the completely unoptimized structural generic programming implementation is about 2x faster.) Neural networks can be thought of as differentiable functions with machine learned state. Manipulating that state via value semantics works extremely well (as demonstrated by the existing S4TF APIs). One additional important aspect of neural networks is a need to explicitly manipulate the contained state (e.g. weight regularlization, resetting the learned parameters when doing transfer learning). Being able to access that state explicitly with a convenient name is valuable from an API ergonomic perspective. Most neural network architectures are described as compositions of other "layers" (bottoming out in a few common layer types, such as `Linear` (aka `Dense`), `Convolution`, and parameterless nonlinearities such as `ReLu`). The most common form of composition is sequential composition. By removing the `StructuralEmpty` "tail" of the HList of fields (and also extending structural generic programming for differentiability), we can now begin to leverage the benefits of automatic conformance implementations for this use case, such as sequential or parallel composition. For example: ```swift struct MyModel: Structural, StructuralLayer { var conv: Conv2D var flatten: Flatten var dense: Dense // The following explicit implementation becomes unnecessary with SGP. func callAsFunction(_ input: Tensor) -> Tensor { return dense(flatten(conv(input))) } } ``` Because `StructuralEmpty` must define one and only one `associatedtype Input`, and because there are many possible `Input` and `Output` types in neural networks (e.g. an Attention layer takes both a key and query tensor), the presence of `StructuralEmpty` constrains the application of structural generic programming. There are other problems that have a similar flavor to neural networks. One can look at neural networks through the lens of a program generating a graph of operations which are then executed with as much parallelism as possible by a runtime. This lens can also be reapplied to a variety of other applications, such as build systems. (e.g. Bazel has a Python-like syntax for build a graph, which is then executed in parallel by the rest of the build system. The combination of CMake and Ninja operates similarly. Hat-tip to clattner@.) In addition, the intermediate products of build systems sometimes need to be named and explicitly referenced. (Perhaps the most sophisticated example of this approach is [`sbt`](https://www.scala-sbt.org/).) One of the non-obvious implications of this change is that it unlocks important use cases that would not be well served by different approaches to metaprogramming. (The first usecase is conditional conformances.) Concretely, when representing a sequential composition of computations produced and consumed by fields of a struct, a non-HList-style representation of the corresponding types forces the generic code to type-cast all the way to `Any`. For example, in some hypothetical hyper-specializing compiler that would specialize reflection from runtime to compile time, a sequential composition operation would look as follows: ``` extension MyProtocol { public func callAsFunction(_ input: FirstField.Input) -> LastField.Output { var intermediateResult: Any = input for field in self.allFields { intermediateResult = field(intermediateResult) } return intermediateResult as! LastField.Output } } ``` Instead, structural generic programming represents the iteration as recursion where the type of the carry variable is explicitly represented as an induction [type] variable, ensuring type safety (at the cognitive cost of a recursive representation). - **Can we get rid of `StructuralEmpty`?** In order to handle zero-sized types, I think we must keep it around. - **What other applications can we derive from SGP thanks to this change?** Please feel free to suggest some more! ;-) Performance numbers comparing the explicit closure-allocating lazy operations vs a structural-based approach. ``` name time std iterations -------------------------------------------------------------------------------------------------- SequentialTransformer: swift lazy transform (count: 1) 149.0 ns ± 201.72 % 1000000 SequentialTransformer: swift lazy transform (count: 10) 211.0 ns ± 87.56 % 1000000 SequentialTransformer: swift lazy transform (count: 100) 736.0 ns ± 87.69 % 1000000 SequentialTransformer: swift lazy transform (count: 1000) 5727.0 ns ± 18.60 % 251236 SequentialTransformer: swift lazy transform (count: 10000) 53680.0 ns ± 11.02 % 26456 SequentialTransformer: swift lazy transform (count: 100000) 538643.0 ns ± 6.06 % 2507 SequentialTransformer: structural lazy transform (count: 1) 151.0 ns ± 157.65 % 1000000 SequentialTransformer: structural lazy transform (count: 10) 758.0 ns ± 64.62 % 1000000 SequentialTransformer: structural lazy transform (count: 100) 1501.0 ns ± 34.70 % 907048 SequentialTransformer: structural lazy transform (count: 1000) 4562.0 ns ± 19.75 % 305737 SequentialTransformer: structural lazy transform (count: 10000) 31690.0 ns ± 8.86 % 48379 SequentialTransformer: structural lazy transform (count: 100000) 287372.0 ns ± 13.87 % 4826 ```
This change explores using structural generic programming to build (at compile time) a type-safe "hyper-parameter" object that can subsequently be manipulated and subsequently used to initialize a NN with the minimum of fuss. See HParamInitExample.swift for an example usage (partially replicated here): ```swift public struct MyInitModel { var conv: Conv2D<Float> var flatten: Flatten<Float> var dense: Dense<Float> } // Thanks to `DifferentiableStructural` conformances, we can derive these protocols automagically! extension MyInitModel: HParamInitLayer, Layer, SequentialLayer { // Must specify typealiases because they are not inferred automatically. :-( public typealias Input = Tensor<Float> public typealias Output = Tensor<Float> public typealias SequentialInput = Input public typealias SequentialOutput = Output public typealias HParam = StaticStructuralRepresentation.HParam } // Usage: func makeExplicitModel() -> MyInitModel { var hparams = MyInitModel.HParam() hparams.conv = .init(height: 3, width: 3, channels: 10) // Fully typesafe! hparams.dense = .init(size: 10) return hparams.build(for: Tensor<Float>(zeros: [5, 28, 28, 1])) } ``` Note: in order to build the full API, I needed to make some modifications to the proposed Structural APIs. (Note: this is a quick-hack, and deserves much more refinement!) Related PR: #613
Closing for now, pending future evaluation. |
@dabrahams (in discussions with @shadaj and others) earlier today (yesterday) mentioned that leveraging structural generic programming might yield some interesting points in API design space as an alternative to the progress so far in #584
I quickly hacked together this draft to see what it might look like. I initially focused on using structural generic programming to avoid needing to spell out the
callAsFunction
implementation. To see what this looks like in action, check outSequentialExample.swift
.Subsequent work can define the shape propagation information as appropriate.