Skip to content
This repository was archived by the owner on Jan 10, 2023. It is now read-only.

Proposal: change structural representation to avoid StructuralEmpty. #5

Closed
wants to merge 1 commit into from

Conversation

saeta
Copy link
Contributor

@saeta saeta commented Jul 5, 2020

This PR represents a proposal to change the way Swift types are encoded into their structural
representation to enable more flexible applications of structural generic programming. This proposal
was inspired by a structural programming-based deep learning model composition
API
.

tl;dr: Drop the StructuralEmpty from the end of the field list.

Today, we represent a struct as a "Cons-list" of its fields, terminated by StructuralEmpty. For
example:

struct Point2: Structural {
	var x: Int
	var y: Float
}

would have the following StructuralRepresentation associated type generated by the compiler:

extension Point2 {
	public typealias StructuralRepresentation =
		StructuralStruct<
			StructuralCons<StructuralProperty<Int>,
			StructuralCons<StructuralProperty<Float>,
			StructuralEmpty>>>

	// ...
}

or, alternatively written with a made-up syntax loosely inspired by Scala's HList type:

typealias StructuralRepresentation =
  StructuralStruct<
  	StructuralProperty<Int> :: StructuralProperty<Float> :: StructuralEmpty>

This proposal suggests we modify the representation to look as follows:

extension Point2 {
	public typealias StructuralRepresentation =
		StructuralStruct<
			StructuralCons<StructuralProperty<Int>,
			StructuralCons<StructuralProperty<Float>>>>

	// ...
}

or in the made-up syntax:

typealias StructuralRepresentation =
  StructuralStruct<
  	StructuralProperty<Int> :: StructuralProperty<Float>>

The advantage of such a shift in representation are three fold:

  1. Simplifies common inductive cases: when providing a structural generic programming-based
    automatic conformance, an extension for StructuralEmpty is required. In the examples in this
    repository, most of them are benign empty implementations (e.g. DecodeJSON,
    DefaultInitializable,
    ScaleBy),
    however some of them require more careful thought (e.g. CustomComparable).

    With this change, only conformances that would like to support zero-sized types are required to
    provide a conformance for StructuralEmpty. (See below.)

  2. Special handling for zero-sized types. Some protocols might want to have distinct
    handling for zero-sized types. For example, when encoding JSON, different applications might want
    to alternatively serialize or not serialize the field. (For example, a sentinal marker field to
    declare a version.)

If this proposal is adopted, the JSON library could drop the StructuralEmpty conformance to
StructuralEmpty, which would automatically remove automatic EncodeJSON conformance for zero-
sized types.

  1. Composition of computation based on fields.

The included motivating example is a way of preserving static information when performing lazy
operations to sequences. In Swift today, escaping closures are heap allocated and type erased, which
effectively forms an optimization boundary. Additionally, they operate with reference semantics
and cannot have any additional properties / fields (unlike classes or structs) and thus cannot have
additional entry points to access the state.

By representing the transformation not as a closure, but instead as a struct, we can realize
better performance, and build additonal entry points to view and manipulate the stored state. This
PR includes a woefully incomplete step in this direction. See the tests for the usage, and the
benchmark numbers below for a comparison of performance. (tl;dr: At small sizes, the existing lazy
transforms are equivalent or faster; at large sizes, the completely unoptimized structural generic
programming implementation is about 2x faster.)

Neural networks can be thought of as differentiable functions with machine learned state.
Manipulating that state via value semantics works extremely well (as demonstrated by the existing
S4TF APIs). One additional important aspect of neural networks is a need to explicitly manipulate
the contained state (e.g. weight regularlization, resetting the learned parameters when doing
transfer learning). Being able to access that state explicitly with a convenient name is valuable
from an API ergonomic perspective.

Most neural network architectures are described as compositions of other "layers" (bottoming out in
a few common layer types, such as Linear (aka Dense), Convolution, and parameterless
nonlinearities such as ReLu). The most common form of composition is sequential composition.

By removing the StructuralEmpty "tail" of the HList of fields (and also extending structural
generic programming for differentiability), we can now begin to leverage the benefits of automatic
conformance implementations for this use case, such as sequential or parallel composition.

For example:

struct MyModel: Structural, StructuralLayer {
	var conv: Conv2D
	var flatten: Flatten
	var dense: Dense

	// The following explicit implementation becomes unnecessary with SGP.
	func callAsFunction(_ input: Tensor) -> Tensor {
		return dense(flatten(conv(input)))
	}
}

Because StructuralEmpty must define one and only one associatedtype Input, and because there are
many possible Input and Output types in neural networks (e.g. an Attention layer takes both a
key and query tensor), the presence of StructuralEmpty constrains the application of structural
generic programming.

There are other problems that have a similar flavor to neural networks. One can look at neural
networks through the lens of a program generating a graph of operations which are then executed with
as much parallelism as possible by a runtime. This lens can also be reapplied to a variety of other
applications, such as build systems. (e.g. Bazel has a Python-like syntax for build a graph, which
is then executed in parallel by the rest of the build system. The combination of CMake and Ninja
operates similarly. Hat-tip to clattner@.) In addition, the intermediate products of build systems
sometimes need to be named and explicitly referenced. (Perhaps the most sophisticated example of
this approach is sbt.)

One of the non-obvious implications of this change is that it unlocks important use cases that would
not be well served by different approaches to metaprogramming. (The first usecase is conditional
conformances.)

Concretely, when representing a sequential composition of computations produced and consumed by
fields of a struct, a non-HList-style representation of the corresponding types forces the generic
code to type-cast all the way to Any. For example, in some hypothetical hyper-specializing
compiler that would specialize reflection from runtime to compile time, a sequential composition
operation would look as follows:

extension MyProtocol {
	public func callAsFunction(_ input: FirstField.Input) -> LastField.Output {
		var intermediateResult: Any = input
		for field in self.allFields {
		    intermediateResult = field(intermediateResult)
		}
		return intermediateResult as! LastField.Output
	}
}

Instead, structural generic programming represents the iteration as recursion where the type of the
carry variable is explicitly represented as an induction [type] variable, ensuring type safety (at
the cognitive cost of a recursive representation).

  • Can we get rid of StructuralEmpty? In order to handle zero-sized types, I think we must
    keep it around.
  • What other applications can we derive from SGP thanks to this change? Please feel free to
    suggest some more! ;-)

Performance numbers comparing the explicit closure-allocating lazy operations vs a structural-based
approach.

name                                                             time        std        iterations
--------------------------------------------------------------------------------------------------
SequentialTransformer: swift lazy transform (count: 1)              149.0 ns ± 201.72 %    1000000
SequentialTransformer: swift lazy transform (count: 10)             211.0 ns ±  87.56 %    1000000
SequentialTransformer: swift lazy transform (count: 100)            736.0 ns ±  87.69 %    1000000
SequentialTransformer: swift lazy transform (count: 1000)          5727.0 ns ±  18.60 %     251236
SequentialTransformer: swift lazy transform (count: 10000)        53680.0 ns ±  11.02 %      26456
SequentialTransformer: swift lazy transform (count: 100000)      538643.0 ns ±   6.06 %       2507
SequentialTransformer: structural lazy transform (count: 1)         151.0 ns ± 157.65 %    1000000
SequentialTransformer: structural lazy transform (count: 10)        758.0 ns ±  64.62 %    1000000
SequentialTransformer: structural lazy transform (count: 100)      1501.0 ns ±  34.70 %     907048
SequentialTransformer: structural lazy transform (count: 1000)     4562.0 ns ±  19.75 %     305737
SequentialTransformer: structural lazy transform (count: 10000)   31690.0 ns ±   8.86 %      48379
SequentialTransformer: structural lazy transform (count: 100000) 287372.0 ns ±  13.87 %       4826

This PR represents a proposal to change the way Swift types are encoded into their structural
representation to enable more flexible applications of structural generic programming. This proposal
was inspired by [a structural programming-based deep learning model composition
API](tensorflow/swift-models#613).

> tl;dr: Drop the `StructuralEmpty` from the end of the field list.

Today, we represent a `struct` as a "Cons-list" of its fields, terminated by `StructuralEmpty`. For
example:

```swift
struct Point2: Structural {
	var x: Int
	var y: Float
}
```

would have the following `StructuralRepresentation` associated type generated by the compiler:

```swift
extension Point2 {
	public typealias StructuralRepresentation =
		StructuralStruct<
			StructuralCons<StructuralProperty<Int>,
			StructuralCons<StructuralProperty<Float>,
			StructuralEmpty>>>

	// ...
}
```

or, alternatively written with a made-up syntax loosely inspired by Scala's `HList` type:

```swift
typealias StructuralRepresentation =
  StructuralStruct<
  	StructuralProperty<Int> :: StructuralProperty<Float> :: StructuralEmpty>
```

This proposal suggests we modify the representation to look as follows:

```swift
extension Point2 {
	public typealias StructuralRepresentation =
		StructuralStruct<
			StructuralCons<StructuralProperty<Int>,
			StructuralCons<StructuralProperty<Float>>>>

	// ...
}
```

or in the made-up syntax:

```
typealias StructuralRepresentation =
  StructuralStruct<
  	StructuralProperty<Int> :: StructuralProperty<Float>>
```

The advantage of such a shift in representation are three fold:

1. **Simplifies common inductive cases**: when providing a structural generic programming-based
   automatic conformance, an extension for `StructuralEmpty` is required. In the examples in this
   repository, most of them are benign empty implementations (e.g. [`DecodeJSON`](https://github.com/google/swift-structural/blob/043713b88913efe79bc5041af5ba3de3c1d74517/Sources/StructuralExamples/DecodeJSON.swift#L61),
   [`DefaultInitializable`](https://github.com/google/swift-structural/blob/043713b88913efe79bc5041af5ba3de3c1d74517/Sources/StructuralExamples/DefaultInitializable.swift#L48),
   [`ScaleBy`](https://github.com/google/swift-structural/blob/043713b88913efe79bc5041af5ba3de3c1d74517/Sources/StructuralExamples/ScaleBy.swift#L85)),
   however some of them require more careful thought (e.g. [`CustomComparable`](https://github.com/google/swift-structural/blob/043713b88913efe79bc5041af5ba3de3c1d74517/Sources/StructuralExamples/CustomComparable.swift#L102)).

   With this change, only conformances that would like to support zero-sized types are required to
   provide a conformance for `StructuralEmpty`. (See below.)

 2. **Special handling for zero-sized types.** Some protocols might want to have distinct
   handling for zero-sized types. For example, when encoding JSON, different applications might want
   to alternatively serialize or not serialize the field. (For example, a sentinal marker field to
   declare a version.)

   If this proposal is adopted, the JSON library could drop the `StructuralEmpty` conformance to
   `StructuralEmpty`, which would automatically remove automatic `EncodeJSON` conformance for zero-
   sized types.

 3. **Composition of computation based on fields.**

The included motivating example is a way of preserving static information when performing lazy
operations to sequences. In Swift today, escaping closures are heap allocated and type erased, which
effectively forms an optimization boundary. Additionally, they operate with reference semantics
and cannot have any additional properties / fields (unlike classes or structs) and thus cannot have
additional entry points to access the state.

By representing the transformation not as a closure, but instead as a `struct`, we can realize
better performance, and build additonal entry points to view and manipulate the stored state. This
PR includes a _woefully incomplete_ step in this direction. See the tests for the usage, and the
benchmark numbers below for a comparison of performance. (tl;dr: At small sizes, the existing lazy
transforms are equivalent or faster; at large sizes, the completely unoptimized structural generic
programming implementation is about 2x faster.)

Neural networks can be thought of as differentiable functions with machine learned state.
Manipulating that state via value semantics works extremely well (as demonstrated by the existing
S4TF APIs). One additional important aspect of neural networks is a need to explicitly manipulate
the contained state (e.g. weight regularlization, resetting the learned parameters when doing
transfer learning). Being able to access that state explicitly with a convenient name is valuable
from an API ergonomic perspective.

Most neural network architectures are described as compositions of other "layers" (bottoming out in
a few common layer types, such as `Linear` (aka `Dense`), `Convolution`, and parameterless
nonlinearities such as `ReLu`). The most common form of composition is sequential composition.

By removing the `StructuralEmpty` "tail" of the HList of fields (and also extending structural
generic programming for differentiability), we can now begin to leverage the benefits of automatic
conformance implementations for this use case, such as sequential or parallel composition.

For example:

```swift
struct MyModel: Structural, StructuralLayer {
	var conv: Conv2D
	var flatten: Flatten
	var dense: Dense

	// The following explicit implementation becomes unnecessary with SGP.
	func callAsFunction(_ input: Tensor) -> Tensor {
		return dense(flatten(conv(input)))
	}
}
```

Because `StructuralEmpty` must define one and only one `associatedtype Input`, and because there are
many possible `Input` and `Output` types in neural networks (e.g. an Attention layer takes both a
key and query tensor), the presence of `StructuralEmpty` constrains the application of structural
generic programming.

There are other problems that have a similar flavor to neural networks. One can look at neural
networks through the lens of a program generating a graph of operations which are then executed with
as much parallelism as possible by a runtime. This lens can also be reapplied to a variety of other
applications, such as build systems. (e.g. Bazel has a Python-like syntax for build a graph, which
is then executed in parallel by the rest of the build system. The combination of CMake and Ninja
operates similarly. Hat-tip to clattner@.) In addition, the intermediate products of build systems
sometimes need to be named and explicitly referenced. (Perhaps the most sophisticated example of
this approach is [`sbt`](https://www.scala-sbt.org/).)

One of the non-obvious implications of this change is that it unlocks important use cases that would
not be well served by different approaches to metaprogramming. (The first usecase is conditional
conformances.)

Concretely, when representing a sequential composition of computations produced and consumed by
fields of a struct, a non-HList-style representation of the corresponding types forces the generic
code to type-cast all the way to `Any`. For example, in some hypothetical hyper-specializing
compiler that would specialize reflection from runtime to compile time, a sequential composition
operation would look as follows:

```
extension MyProtocol {
	public func callAsFunction(_ input: FirstField.Input) -> LastField.Output {
		var intermediateResult: Any = input
		for field in self.allFields {
		    intermediateResult = field(intermediateResult)
		}
		return intermediateResult as! LastField.Output
	}
}
```

Instead, structural generic programming represents the iteration as recursion where the type of the
carry variable is explicitly represented as an induction [type] variable, ensuring type safety (at
the cognitive cost of a recursive representation).

 - **Can we get rid of `StructuralEmpty`?** In order to handle zero-sized types, I think we must
   keep it around.
 - **What other applications can we derive from SGP thanks to this change?** Please feel free to
   suggest some more! ;-)

Performance numbers comparing the explicit closure-allocating lazy operations vs a structural-based
approach.

```
name                                                             time        std        iterations
--------------------------------------------------------------------------------------------------
SequentialTransformer: swift lazy transform (count: 1)              149.0 ns ± 201.72 %    1000000
SequentialTransformer: swift lazy transform (count: 10)             211.0 ns ±  87.56 %    1000000
SequentialTransformer: swift lazy transform (count: 100)            736.0 ns ±  87.69 %    1000000
SequentialTransformer: swift lazy transform (count: 1000)          5727.0 ns ±  18.60 %     251236
SequentialTransformer: swift lazy transform (count: 10000)        53680.0 ns ±  11.02 %      26456
SequentialTransformer: swift lazy transform (count: 100000)      538643.0 ns ±   6.06 %       2507
SequentialTransformer: structural lazy transform (count: 1)         151.0 ns ± 157.65 %    1000000
SequentialTransformer: structural lazy transform (count: 10)        758.0 ns ±  64.62 %    1000000
SequentialTransformer: structural lazy transform (count: 100)      1501.0 ns ±  34.70 %     907048
SequentialTransformer: structural lazy transform (count: 1000)     4562.0 ns ±  19.75 %     305737
SequentialTransformer: structural lazy transform (count: 10000)   31690.0 ns ±   8.86 %      48379
SequentialTransformer: structural lazy transform (count: 100000) 287372.0 ns ±  13.87 %       4826
```
@shabalind
Copy link
Contributor

Closing this experimental PR for now similarly to #10. We might revisit it later.

@shabalind shabalind closed this Sep 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants