Skip to content

Missing activation for the time embedding inside ResidualBlock for DDPM? #165

Open
@EliasNehme

Description

@EliasNehme

In the DDPM Unet implementation, the residual blocks incorporate the time embedding by applying a linear layer only with no prior activation:


However, the positionally encoded time embedding is already the result of a linear layer:

Hence, both these layers collapse to a single linear layer with no non-linear mapping per residual block.

In the original tensorflow implementation by the author, the time embedding is first passed through a nonlinearity and only then through a linear layer:
https://github.com/hojonathanho/diffusion/blob/1e0dceb3b3495bbe19116a5e1b3596cd0706c543/diffusion_tf/models/unet.py#L49

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions