[WIP]: Idea exploring easier layer init. #883

saeta · 2020-04-24T20:36:04Z

The problem: when constructing a model, you often need to do some calculations
"out-of-band" / "by hand" to initialize the layers in the model correctly.
For example: in building a simple convolutional network with a dense layer on
top, you need to keep track of how the image is changing through the
convolutions to ensure you set the input size of the dense layer correctly. If
it is set incorrectly, you get an immediate shape mis-match.

The proposed solution: layers are given an extra "shape-based initializer" that
allows the layer's initializer to propagate shape information forward.

Alternatives considered: a number of libraries (e.g. Keras, Haiku) don't
initialize the parameters of libraries until the first run through. This is
cumbersome in Swift, as that would require func callAsFunction to be marked
as mutating, or for layers to become classes.

Problems of the current design: I just put this together quickly and
specialized everything to Scalar == Float. This is obviously suboptimal.

Extensions:

No-op device: Right now, we perform computations on zeros, just to avoid
writing duplicated shape-propagation rules. Instead, if we had a no-op
device, then we could trivially reuse the layer implementations. (Or
alternatively, if we somehow made func callAsFunction generic over the
Tensor type.)
VMap: Proper vmap support would simplify these shape-heavy computations.

The problem: when constructing a model, you often need to do some calculations "out-of-band" / "by hand" to initialize the layers in the model correctly. For example: in building a simple convolutional network with a dense layer on top, you need to keep track of how the image is changing through the convolutions to ensure you set the input size of the dense layer correctly. If it is set incorrectly, you get an immediate shape mis-match. The proposed solution: layers are given an extra "shape-based initializer" that allows the layer's initializer to propagate shape information forward. Alternatives considered: a number of libraries (e.g. Keras, Haiku) don't initialize the parameters of libraries until the first run through. This is cumbersome in Swift, as that would require `func callAsFunction` to be marked as mutating, or for layers to become `class`es. Problems of the current design: I just put this together quickly and specialized everything to `Scalar == Float`. This is obviously suboptimal. Extensions: - *No-op device*: Right now, we perform computations on zeros, just to avoid writing duplicated shape-propagation rules. Instead, if we had a no-op device, then we could trivially reuse the layer implementations. (Or alternatively, if we somehow made `func callAsFunction` generic over the Tensor type.) - *VMap*: Proper vmap support would simplify these shape-heavy computations.

dabrahams · 2020-04-24T20:57:45Z

Initializing parameters on first run can be done without making layers classes or appear to be non-pure functions (i.e. mark callAsFunction mutating), but you would have to put the mutation somewhere other than in the directly stored properties of the layer, like an attached class instance. I'm not sure why that would be the wrong answer for us.

I am a little weak conceptually; why should one have to "set the input size" on the Dense layer, as opposed to simply having it come from the size of the input that is actually passed to it?

saeta · 2020-04-24T21:57:19Z

I am a little weak conceptually; why should one have to "set the input size" on the Dense layer, as opposed to simply having it come from the size of the input that is actually passed to it?

Here's what we do today:

var classifier = Sequential {
    Conv2D<Float>(filterShape: (5, 5, 1, 6), padding: .same, activation: relu)
    AvgPool2D<Float>(poolSize: (2, 2), strides: (2, 2))
    Conv2D<Float>(filterShape: (5, 5, 6, 16), activation: relu)
    AvgPool2D<Float>(poolSize: (2, 2), strides: (2, 2))
    Flatten<Float>()
    Dense<Float>(inputSize: 400, outputSize: 120, activation: relu)
    Dense<Float>(inputSize: 120, outputSize: 84, activation: relu)
    Dense<Float>(inputSize: 84, outputSize: 10)
}

In particular, we know (a priori) that after the first 5 layers (Conv2D's, AvgPool2D's, and Flatten) that given our known (fixed) image input size that we will have tensors of shape [batchSize, 400]. Further, while the outputSize is an orthogonal hyperparameter, the subsequent inputSize (e.g. 120, and 84) are just "copied" from the previous layer. Does that help clarify?

dan-zheng

To support Keras-style Sequential shape inference, I think a layer protocol requirement for shape propagation is also necessary?

protocol Layer: Differentiable {
  associatedtype Input: Differentiable
  associatedtype Output: Differentiable
  func callAsFunction(_ input: Input) -> Output

  /// Returns the output shape of this layer given an input shape.
  ///
  /// This implements shape propagation.
  func outputShape(for inputShape: TensorShape) -> TensorShape
}

I'm not sure it's easy for layers to implement shape propagation unless shape propagation is also defined for primitive tensor operations (like matmul and conv2d).

saeta · 2020-04-24T23:26:21Z

I'm not sure it's easy for layers to implement shape propagation unless shape propagation is also defined for primitive tensor operations (like matmul and conv2d).

Right, I think in the design you proposed, it's not so easy. My goal with exploring this direction is to re-use the implicit shape transfer functions inherent in the tensor operations themselves. This approach also conveniently avoids needing to re-write the logic inside callAsFunction inside outputShape(for:) as well. Does that make sense?

dan-zheng · 2020-04-26T05:20:30Z

My goal with exploring this direction is to re-use the implicit shape transfer functions inherent in the tensor operations themselves. This approach also conveniently avoids needing to re-write the logic inside callAsFunction inside outputShape(for:) as well. Does that make sense?

I thought about this a bit. I think this PR (adding layer initialization based on hyperparameters and input shapes) is orthogonal to shape propagation. But shape propagation still seems necessary to implement Keras-style shape-inferring Sequential.

I wanted to explore shape propagation a bit further, so I ended up implementing a shape-inferring Sequential! Here's a Gist:

let input = Tensor<Float>(randomNormal: [10000, 784])
let model = Sequential(inputShape: input.shape) {
  Dense<Float>.make(.init(outputSize: 784))
  Dense<Float>.make(.init(outputSize: 400, useBias: true))
  Dense<Float>.make(.init(outputSize: 100))
  Dense<Float>.make(.init(outputSize: 10, activation: relu))
}
print(model(input).shape) // [10000, 10]

The Gist adds layer initialization based on hyperparameters and input shapes too. But layer initializers are curried, unlike the approach in this PR:

// Curried:
(Layer.Hyperparameters) -> (Layer.Input.Shape) -> Layer
// Uncurried:
(Layer.Hyperparameters, Layer.Input.Shape) -> Layer

Curried initializers seem necessary to easily support a ShapedLayerBuilder function builder for Sequential: each value in the Sequential trailing closure effectively has type (Layer.Input.Shape) -> Layer.

I wonder what others think about the Gist's approach? I could turn it into a separate issue or PR for discussion. The use case of "shape-inferring Sequential" influences the design of layer initialization.

dabrahams · 2020-04-26T19:05:10Z

Does that help clarify?

@saeta I'm afraid not. I see no reason the same model couldn't be written like this, given the right library pieces (I'm leaving out other niceties, like avoiding writing <Float> everywhere, for now):

var classifier = Sequential {
    Conv2D<Float>(filterShape: (5, 5, 1, 6), padding: .same, activation: relu)
    AvgPool2D<Float>(poolSize: (2, 2), strides: (2, 2))
    Conv2D<Float>(filterShape: (5, 5, 6, 16), activation: relu)
    AvgPool2D<Float>(poolSize: (2, 2), strides: (2, 2))
    Flatten<Float>()
    Dense<Float>(outputSize: 120, activation: relu)
    Dense<Float>(outputSize: 84, activation: relu)
    Dense<Float>(outputSize: 10)
}

IIUC this is essentially what @dan-zheng's gist proves out, with different syntax. Am I missing something?

dabrahams · 2020-04-26T19:08:26Z

@dan-zheng It has always seemed clear to me that everything (not just Sequential) should be “shape-inferring.”

saeta · 2020-04-27T00:02:25Z

(I'm leaving out other niceties, like avoiding writing <Float> everywhere, for now)

I've been meaning to play around more based on your suggestion a while back. I put together a quick PR: tensorflow/swift-models#465 It appears that this trick doesn't work cross-module for some reason. Do you know what I'm doing wrong, or should these be filed as issues?

dabrahams · 2020-04-27T16:55:16Z

@saeta Can you be specific about what doesn't work, or tell me how to use your PR to reproduce a compilation failure?

brettkoonce · 2020-04-27T17:27:18Z

@dabrahams believe this is the error in question: tensorflow/swift-models#465 (comment)

saeta · 2020-04-27T23:19:38Z

Can you be specific about what doesn't work, or tell me how to use your PR to reproduce a compilation failure?
believe this is the error in question: tensorflow/swift-models#465

Yup, @brettkoonce is exactly right. I played around with it more today, and it turns out that this was a SwiftPM non-determinism bug. I minimized it down to a trivial example: https://bugs.swift.org/browse/SR-12688

@dabrahams : You alluded to "given the right library pieces"; do you think you could share a bit what you have in mind? (Either in textual or executable form?) :-)

dabrahams · 2020-04-28T20:36:08Z

@saeta I don't have anything very specific in mind. Maybe it's just a matter of perspective. If you view the model code as the use of an EDSL for describing the computation it performs, and there are places where you know the output shapes can be deduced at runtime from input shapes, it's clear you don't need to specify them in the model. The problem of translating a given specification (model) into code/data structures for actually performing the computation is separable, and not bounded, for example, by the idea that constructing the thing called Conv2D above has to allocate the memory the corresponding part of the model needs.

shabalind · 2020-06-10T17:17:11Z

We are closing this one for now!

saeta · 2020-06-10T17:17:30Z

Tagging @shadaj who will be pushing on this.

saeta · 2020-06-10T17:37:52Z

For those following along at home, check out @shadaj 's work in: tensorflow/swift-models#584

saeta marked this pull request as draft April 24, 2020 20:36

dan-zheng reviewed Apr 24, 2020

View reviewed changes

shabalind closed this Jun 10, 2020

saeta deleted the easy-layer-init branch June 10, 2020 17:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP]: Idea exploring easier layer init. #883

[WIP]: Idea exploring easier layer init. #883

Uh oh!

saeta commented Apr 24, 2020

Uh oh!

dabrahams commented Apr 24, 2020

Uh oh!

saeta commented Apr 24, 2020

Uh oh!

dan-zheng left a comment

Uh oh!

saeta commented Apr 24, 2020

Uh oh!

dan-zheng commented Apr 26, 2020 •

edited

Loading

Uh oh!

dabrahams commented Apr 26, 2020 •

edited

Loading

Uh oh!

dabrahams commented Apr 26, 2020

Uh oh!

saeta commented Apr 27, 2020 •

edited

Loading

Uh oh!

dabrahams commented Apr 27, 2020

Uh oh!

brettkoonce commented Apr 27, 2020

Uh oh!

saeta commented Apr 27, 2020

Uh oh!

dabrahams commented Apr 28, 2020 •

edited

Loading

Uh oh!

shabalind commented Jun 10, 2020

Uh oh!

saeta commented Jun 10, 2020

Uh oh!

saeta commented Jun 10, 2020

Uh oh!

Uh oh!

[WIP]: Idea exploring easier layer init. #883

[WIP]: Idea exploring easier layer init. #883

Uh oh!

Conversation

saeta commented Apr 24, 2020

Uh oh!

dabrahams commented Apr 24, 2020

Uh oh!

saeta commented Apr 24, 2020

Uh oh!

dan-zheng left a comment

Choose a reason for hiding this comment

Uh oh!

saeta commented Apr 24, 2020

Uh oh!

dan-zheng commented Apr 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dabrahams commented Apr 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dabrahams commented Apr 26, 2020

Uh oh!

saeta commented Apr 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dabrahams commented Apr 27, 2020

Uh oh!

brettkoonce commented Apr 27, 2020

Uh oh!

saeta commented Apr 27, 2020

Uh oh!

dabrahams commented Apr 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shabalind commented Jun 10, 2020

Uh oh!

saeta commented Jun 10, 2020

Uh oh!

saeta commented Jun 10, 2020

Uh oh!

Uh oh!

dan-zheng commented Apr 26, 2020 •

edited

Loading

dabrahams commented Apr 26, 2020 •

edited

Loading

saeta commented Apr 27, 2020 •

edited

Loading

dabrahams commented Apr 28, 2020 •

edited

Loading