Skip to content
This repository was archived by the owner on Apr 23, 2025. It is now read-only.

Structural generic layers 2 #650

Closed
wants to merge 6 commits into from
Closed

Conversation

saeta
Copy link
Contributor

@saeta saeta commented Jul 27, 2020

Use structural generic programming to build an HParam object.

This change explores using structural generic programming to build (at compile time)
a type-safe "hyper-parameter" object that can subsequently be manipulated and
subsequently used to initialize a NN with the minimum of fuss.

See HParamInitExample.swift for an example usage (partially replicated, and simultaneously idealized here -- there are a few bugs / issues that must be resolved first before we can achieve this):

public struct MyModel: Structural, HParamInitLayer, SequentialLayer {
    var conv: Conv2D<Float>
    var flatten: Flatten<Float>
    var dense: Dense<Float>
}

// Usage:
func makeModel() -> MyModel {
    var hparams = MyModel.HParam()
    hparams.conv = .init(height: 3, width: 3, channels: 10)  // Fully typesafe!
    hparams.dense = .init(size: 10)

    return hparams.build(for: Tensor(zeros: [5, 28, 28, 1]))
}

Note: in order to build the full API, I needed to make some modifications to the proposed
Structural APIs. (Note: this is a quick-hack, and deserves much more refinement!)

For comparisons, writing this out using the existing APIs would look something like the following and require (1) explicit shape calculations, (2) custom definition of hyperparameters, (3) redundant specification of the forward pass + initialization logic:

public struct MyModel: Layer {
  var conv: Conv2D<Float>
  var flatten: Flatten<Float>
  var dense: Dense<Float>

  // TODO: Pass through the other hyperparameters to `conv` and `dense`!
  /// - Precondition: The size of the input to the denseShape must exactly correspond to the size of the input fed to it after running through the convolution. In this case, it will be `floor((inputTensor.shape[1] - convShape.1 + 1) * (inputTensor.shape[2] - convShape.2 + 1) * convShape.4)` as no padding is being used, and strides are (1, 1). (Note: Checked only when trying to actually use the model to perform inference or training).
  /// - Precondition: convShape.3 must be exactly equal to `input.shape[-1]` from the `callAsFunction` parameter.
  public init(convShape: (Int, Int, Int, Int), denseShape: (Int, Int)) {
    conv = Conv2D(convShape)
    flatten = Flatten()
    dense = Dense(inputSize: denseShape.0, outputSize: denseShape.1)
  }

  @differentiable
  func callAsFunction(_ input: Tensor<Float>) -> Tensor<Float> {
    return input.sequenced(through: conv, flatten, dense)
  }
}

Note: for an accurate comparison the initializer should provide a variety of additional hyperparameters, such as strides / etc. The issue with struct & init composition is that default arguments don't compose, and instead must be specified at every level of composition. By representing the information within an explicit alternative type, we sidestep this issue.

Further ideas to explore include:

  1. using extensions to add convenience initializers for the HParams objects, such as:
extension MyModel.HParam {
  public enum Flavor {
    case mnist
    case imageNet
  }
  public init(_ flavor: Flavor) {
    self.init()
    self.conv = .init(height: 3, width: 3, channels: 10)
    switch flavor {
    case .mnist:
      self.dense = .init(size: 10)
    case .imageNet:
      self.dense = .init(size: 1000)
    }    
  }
}

with example usage:

var model = MyModel.HParam(.mnist).build(for: Tensor(zeros: [10, 28, 28, 1]))
  1. Using a combination of data structures & types to represent skip connections / repeated layers / etc.

Note: of course, we can always fall back to explicitly writing init and func callAsFunction as desired for arbitrarily complicated control flow.

  1. Using structural generic programming on the HParam types themselves (e.g. for easy serialization / deserialization, checkpointing, flag parsing, etc).

  2. Take an explicit random number generator when building the model from the HParam type to allow random number generation to be made deterministic.

Related PR: #613

saeta added 6 commits June 22, 2020 23:50
This change is a first attempt at leveraging structural generic programming
to implement some sugar for a higher-level API.
This change explores using structural generic programming to build (at compile time)
a type-safe "hyper-parameter" object that can subsequently be manipulated and
subsequently used to initialize a NN with the minimum of fuss.

See HParamInitExample.swift for an example usage (partially replicated here):

```swift

public struct MyInitModel {
    var conv: Conv2D<Float>
    var flatten: Flatten<Float>
    var dense: Dense<Float>
}

// Thanks to `DifferentiableStructural` conformances, we can derive these protocols automagically!
extension MyInitModel: HParamInitLayer, Layer, SequentialLayer {
    // Must specify typealiases because they are not inferred automatically. :-(
    public typealias Input = Tensor<Float>
    public typealias Output = Tensor<Float>
    public typealias SequentialInput = Input
    public typealias SequentialOutput = Output
    public typealias HParam = StaticStructuralRepresentation.HParam
}

// Usage:
func makeExplicitModel() -> MyInitModel {
    var hparams = MyInitModel.HParam()
    hparams.conv = .init(height: 3, width: 3, channels: 10)  // Fully typesafe!
    hparams.dense = .init(size: 10)

    return hparams.build(for: Tensor<Float>(zeros: [5, 28, 28, 1]))
}
```

Note: in order to build the full API, I needed to make some modifications to the proposed
Structural APIs. (Note: this is a quick-hack, and deserves much more refinement!)

Related PR: #613
@saeta
Copy link
Contributor Author

saeta commented Jul 27, 2020

CC @shadaj @shabalind @dabrahams

saeta added a commit to google/swift-structural that referenced this pull request Jul 31, 2020
Based on explorations of structural generic programming in
tensorflow/swift-models#650 we would like to support
smooth interactions between KeyPaths and structural generic programming.
Based on discussions with @shabalind, I have put together a quick
proof-of-concept demonstrating how to unify a "static structural"
implementation with the "instance"-level structural generic programming we
have been exploring so far.

This is definitely an incomplete implementation, but this would allow us to
easily implement KeyPathIterable and its ilk on top of structural generic
programming (something not possible previously), in addition to unifying with
the existing KeyPath world.

There are a couple further extensions:

 1. **Solution for `HNil`** To use a proper cons-list construction, this will
    require a bottom type (`Never`) that conforms to every protocol, and
    multiple conformances. Alternatively, an encoding that appears to work
    is a cons-list construction that doesn't use a special end type.
 2. **Typealias inference**: Swift does not currently infer typealiases for
    these kinds of derived conformances.
 3. **Enum support** This current implementation doesn't include enum support
    although this is straight forward to add.
 4. **Simplier structural representations**: When the single static structural
    derived conformance, we can easily implement multiple simpler "structural"
    encodings (e.g. a simple Cons-list of values in addition to the current
    structural representation).
 5. Extensions to KeyPaths could enable StaticStructural to simply be the key
    paths themselves. (i.e. if you could retrieve the field name corresponding
    to the key path.)
@dabrahams
Copy link
Contributor

Closing until structural GP is available in the upstream compiler.

@dabrahams dabrahams closed this Sep 2, 2020
@saeta saeta deleted the structural-generic-layers2 branch September 2, 2020 17:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants