Skip to content
This repository was archived by the owner on Apr 23, 2025. It is now read-only.

Exploring structural generic programming and layer APIs #613

Closed
wants to merge 3 commits into from

Conversation

saeta
Copy link
Contributor

@saeta saeta commented Jun 23, 2020

@dabrahams (in discussions with @shadaj and others) earlier today (yesterday) mentioned that leveraging structural generic programming might yield some interesting points in API design space as an alternative to the progress so far in #584

I quickly hacked together this draft to see what it might look like. I initially focused on using structural generic programming to avoid needing to spell out the callAsFunction implementation. To see what this looks like in action, check out SequentialExample.swift.

Subsequent work can define the shape propagation information as appropriate.

saeta added 3 commits June 22, 2020 23:50
This change is a first attempt at leveraging structural generic programming
to implement some sugar for a higher-level API.
@saeta
Copy link
Contributor Author

saeta commented Jun 23, 2020

@shabalind
Copy link
Contributor

shabalind commented Jun 23, 2020

I am really excited to see the idea of building Differerentiable on top of Structural panning out quite well in practice!

public var conv = Conv2D<Float>(filterShape: (5, 5, 3, 6))
public var pool = MaxPool2D<Float>(poolSize: (2, 2), strides: (2, 2))
public var flatten = Flatten<Float>()
@ResidualConnection var denseSkipped = Dense<Float>(inputSize: 36 * 6, outputSize: 36 * 6)
Copy link
Contributor

@shadaj shadaj Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would we express models where two different transformations need to be applied to the same data? As far as I understand, ResidualConnection assumes that one of the transformations will always be the identity function, but for example in ResNet this may not be the case if a projection is needed (https://github.com/tensorflow/swift-models/blob/master/Models/ImageClassification/ResNet.swift#L92).

It also feels a bit awkward to write this as part of a sequential layer since it's more something parallel. I wonder if there is some way we can use properties for the vertical axis and something else for a horizontal axis of parallel layers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there's definitely other ways to spell this. One thing we could do is instead of using property wrappers, we could instead have something like:

struct ParallelLayers<Lhs: Layer, Rhs: Layer>: Layer where Lhs.Input == Rhs.Input, ... {
  var lhs: Lhs
  var rhs: Rhs
  // TODO: make merge func configurable.
  @differentiable
  public func callAsFunction(_ input: Lhs.Input) -> Lhs.Output {
    return lhs(input) + rhs(input)
  }
}

which could represent the parallelism.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could work really well! It's interesting to see this combination of using both structural and regular functions/structs to compose layers.

Another (slightly crazier) idea is to offer a structural ParallelLayer where all the properties are executed on the same input. From what I see, the biggest advantage of the structural approach is you replace the keyed access to layers with regular property accesses since every element in the sequence has a label (the property name). Though I'm not sure how the user would define the merge function in this case since there aren't a fixed number of parallel layers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's definitely another potentially interesting point in design space.

Perhaps you just require users to define a pair-wise reduction function. Alternatively, perhaps a fold function and initial value could be provided...

public var conv = Conv2D<Float>(filterShape: (5, 5, 3, 6))
public var pool = MaxPool2D<Float>(poolSize: (2, 2), strides: (2, 2))
public var flatten = Flatten<Float>()
@SequentialSkip(passing: Type<Tensor<Float>>()) var denseSkipped = Dense<Float>(inputSize: 1, outputSize: 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so passing is the type of the previous and following layer? I guess it can't be inferred since we're using properties.

saeta added a commit to saeta/swift-structural that referenced this pull request Jul 5, 2020
This PR represents a proposal to change the way Swift types are encoded into their structural
representation to enable more flexible applications of structural generic programming. This proposal
was inspired by [a structural programming-based deep learning model composition
API](tensorflow/swift-models#613).

> tl;dr: Drop the `StructuralEmpty` from the end of the field list.

Today, we represent a `struct` as a "Cons-list" of its fields, terminated by `StructuralEmpty`. For
example:

```swift
struct Point2: Structural {
	var x: Int
	var y: Float
}
```

would have the following `StructuralRepresentation` associated type generated by the compiler:

```swift
extension Point2 {
	public typealias StructuralRepresentation =
		StructuralStruct<
			StructuralCons<StructuralProperty<Int>,
			StructuralCons<StructuralProperty<Float>,
			StructuralEmpty>>>

	// ...
}
```

or, alternatively written with a made-up syntax loosely inspired by Scala's `HList` type:

```swift
typealias StructuralRepresentation =
  StructuralStruct<
  	StructuralProperty<Int> :: StructuralProperty<Float> :: StructuralEmpty>
```

This proposal suggests we modify the representation to look as follows:

```swift
extension Point2 {
	public typealias StructuralRepresentation =
		StructuralStruct<
			StructuralCons<StructuralProperty<Int>,
			StructuralCons<StructuralProperty<Float>>>>

	// ...
}
```

or in the made-up syntax:

```
typealias StructuralRepresentation =
  StructuralStruct<
  	StructuralProperty<Int> :: StructuralProperty<Float>>
```

The advantage of such a shift in representation are three fold:

1. **Simplifies common inductive cases**: when providing a structural generic programming-based
   automatic conformance, an extension for `StructuralEmpty` is required. In the examples in this
   repository, most of them are benign empty implementations (e.g. [`DecodeJSON`](https://github.com/google/swift-structural/blob/043713b88913efe79bc5041af5ba3de3c1d74517/Sources/StructuralExamples/DecodeJSON.swift#L61),
   [`DefaultInitializable`](https://github.com/google/swift-structural/blob/043713b88913efe79bc5041af5ba3de3c1d74517/Sources/StructuralExamples/DefaultInitializable.swift#L48),
   [`ScaleBy`](https://github.com/google/swift-structural/blob/043713b88913efe79bc5041af5ba3de3c1d74517/Sources/StructuralExamples/ScaleBy.swift#L85)),
   however some of them require more careful thought (e.g. [`CustomComparable`](https://github.com/google/swift-structural/blob/043713b88913efe79bc5041af5ba3de3c1d74517/Sources/StructuralExamples/CustomComparable.swift#L102)).

   With this change, only conformances that would like to support zero-sized types are required to
   provide a conformance for `StructuralEmpty`. (See below.)

 2. **Special handling for zero-sized types.** Some protocols might want to have distinct
   handling for zero-sized types. For example, when encoding JSON, different applications might want
   to alternatively serialize or not serialize the field. (For example, a sentinal marker field to
   declare a version.)

   If this proposal is adopted, the JSON library could drop the `StructuralEmpty` conformance to
   `StructuralEmpty`, which would automatically remove automatic `EncodeJSON` conformance for zero-
   sized types.

 3. **Composition of computation based on fields.**

The included motivating example is a way of preserving static information when performing lazy
operations to sequences. In Swift today, escaping closures are heap allocated and type erased, which
effectively forms an optimization boundary. Additionally, they operate with reference semantics
and cannot have any additional properties / fields (unlike classes or structs) and thus cannot have
additional entry points to access the state.

By representing the transformation not as a closure, but instead as a `struct`, we can realize
better performance, and build additonal entry points to view and manipulate the stored state. This
PR includes a _woefully incomplete_ step in this direction. See the tests for the usage, and the
benchmark numbers below for a comparison of performance. (tl;dr: At small sizes, the existing lazy
transforms are equivalent or faster; at large sizes, the completely unoptimized structural generic
programming implementation is about 2x faster.)

Neural networks can be thought of as differentiable functions with machine learned state.
Manipulating that state via value semantics works extremely well (as demonstrated by the existing
S4TF APIs). One additional important aspect of neural networks is a need to explicitly manipulate
the contained state (e.g. weight regularlization, resetting the learned parameters when doing
transfer learning). Being able to access that state explicitly with a convenient name is valuable
from an API ergonomic perspective.

Most neural network architectures are described as compositions of other "layers" (bottoming out in
a few common layer types, such as `Linear` (aka `Dense`), `Convolution`, and parameterless
nonlinearities such as `ReLu`). The most common form of composition is sequential composition.

By removing the `StructuralEmpty` "tail" of the HList of fields (and also extending structural
generic programming for differentiability), we can now begin to leverage the benefits of automatic
conformance implementations for this use case, such as sequential or parallel composition.

For example:

```swift
struct MyModel: Structural, StructuralLayer {
	var conv: Conv2D
	var flatten: Flatten
	var dense: Dense

	// The following explicit implementation becomes unnecessary with SGP.
	func callAsFunction(_ input: Tensor) -> Tensor {
		return dense(flatten(conv(input)))
	}
}
```

Because `StructuralEmpty` must define one and only one `associatedtype Input`, and because there are
many possible `Input` and `Output` types in neural networks (e.g. an Attention layer takes both a
key and query tensor), the presence of `StructuralEmpty` constrains the application of structural
generic programming.

There are other problems that have a similar flavor to neural networks. One can look at neural
networks through the lens of a program generating a graph of operations which are then executed with
as much parallelism as possible by a runtime. This lens can also be reapplied to a variety of other
applications, such as build systems. (e.g. Bazel has a Python-like syntax for build a graph, which
is then executed in parallel by the rest of the build system. The combination of CMake and Ninja
operates similarly. Hat-tip to clattner@.) In addition, the intermediate products of build systems
sometimes need to be named and explicitly referenced. (Perhaps the most sophisticated example of
this approach is [`sbt`](https://www.scala-sbt.org/).)

One of the non-obvious implications of this change is that it unlocks important use cases that would
not be well served by different approaches to metaprogramming. (The first usecase is conditional
conformances.)

Concretely, when representing a sequential composition of computations produced and consumed by
fields of a struct, a non-HList-style representation of the corresponding types forces the generic
code to type-cast all the way to `Any`. For example, in some hypothetical hyper-specializing
compiler that would specialize reflection from runtime to compile time, a sequential composition
operation would look as follows:

```
extension MyProtocol {
	public func callAsFunction(_ input: FirstField.Input) -> LastField.Output {
		var intermediateResult: Any = input
		for field in self.allFields {
		    intermediateResult = field(intermediateResult)
		}
		return intermediateResult as! LastField.Output
	}
}
```

Instead, structural generic programming represents the iteration as recursion where the type of the
carry variable is explicitly represented as an induction [type] variable, ensuring type safety (at
the cognitive cost of a recursive representation).

 - **Can we get rid of `StructuralEmpty`?** In order to handle zero-sized types, I think we must
   keep it around.
 - **What other applications can we derive from SGP thanks to this change?** Please feel free to
   suggest some more! ;-)

Performance numbers comparing the explicit closure-allocating lazy operations vs a structural-based
approach.

```
name                                                             time        std        iterations
--------------------------------------------------------------------------------------------------
SequentialTransformer: swift lazy transform (count: 1)              149.0 ns ± 201.72 %    1000000
SequentialTransformer: swift lazy transform (count: 10)             211.0 ns ±  87.56 %    1000000
SequentialTransformer: swift lazy transform (count: 100)            736.0 ns ±  87.69 %    1000000
SequentialTransformer: swift lazy transform (count: 1000)          5727.0 ns ±  18.60 %     251236
SequentialTransformer: swift lazy transform (count: 10000)        53680.0 ns ±  11.02 %      26456
SequentialTransformer: swift lazy transform (count: 100000)      538643.0 ns ±   6.06 %       2507
SequentialTransformer: structural lazy transform (count: 1)         151.0 ns ± 157.65 %    1000000
SequentialTransformer: structural lazy transform (count: 10)        758.0 ns ±  64.62 %    1000000
SequentialTransformer: structural lazy transform (count: 100)      1501.0 ns ±  34.70 %     907048
SequentialTransformer: structural lazy transform (count: 1000)     4562.0 ns ±  19.75 %     305737
SequentialTransformer: structural lazy transform (count: 10000)   31690.0 ns ±   8.86 %      48379
SequentialTransformer: structural lazy transform (count: 100000) 287372.0 ns ±  13.87 %       4826
```
saeta added a commit that referenced this pull request Jul 27, 2020
This change explores using structural generic programming to build (at compile time)
a type-safe "hyper-parameter" object that can subsequently be manipulated and
subsequently used to initialize a NN with the minimum of fuss.

See HParamInitExample.swift for an example usage (partially replicated here):

```swift

public struct MyInitModel {
    var conv: Conv2D<Float>
    var flatten: Flatten<Float>
    var dense: Dense<Float>
}

// Thanks to `DifferentiableStructural` conformances, we can derive these protocols automagically!
extension MyInitModel: HParamInitLayer, Layer, SequentialLayer {
    // Must specify typealiases because they are not inferred automatically. :-(
    public typealias Input = Tensor<Float>
    public typealias Output = Tensor<Float>
    public typealias SequentialInput = Input
    public typealias SequentialOutput = Output
    public typealias HParam = StaticStructuralRepresentation.HParam
}

// Usage:
func makeExplicitModel() -> MyInitModel {
    var hparams = MyInitModel.HParam()
    hparams.conv = .init(height: 3, width: 3, channels: 10)  // Fully typesafe!
    hparams.dense = .init(size: 10)

    return hparams.build(for: Tensor<Float>(zeros: [5, 28, 28, 1]))
}
```

Note: in order to build the full API, I needed to make some modifications to the proposed
Structural APIs. (Note: this is a quick-hack, and deserves much more refinement!)

Related PR: #613
@BradLarson
Copy link
Contributor

Closing for now, pending future evaluation.

@BradLarson BradLarson closed this Oct 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants