Missing activation for the time embedding inside ResidualBlock for DDPM?

In the DDPM Unet implementation, the residual blocks incorporate the time embedding by applying a linear layer only with no prior activation: 
https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/b1f5c8e3a5f08bb195698b0410340b1dc2d8c821/labml_nn/diffusion/ddpm/unet.py#L130
However, the positionally encoded time embedding is already the result of a linear layer:
https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/b1f5c8e3a5f08bb195698b0410340b1dc2d8c821/labml_nn/diffusion/ddpm/unet.py#L80
Hence, both these layers collapse to a single linear layer with no non-linear mapping per residual block. 

In the original tensorflow implementation by the author, the time embedding is first passed through a nonlinearity and only then through a linear layer:
https://github.com/hojonathanho/diffusion/blob/1e0dceb3b3495bbe19116a5e1b3596cd0706c543/diffusion_tf/models/unet.py#L49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Missing activation for the time embedding inside ResidualBlock for DDPM? #165

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Missing activation for the time embedding inside ResidualBlock for DDPM? #165

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions