Open
Description
In the DDPM Unet implementation, the residual blocks incorporate the time embedding by applying a linear layer only with no prior activation:
However, the positionally encoded time embedding is already the result of a linear layer:
Hence, both these layers collapse to a single linear layer with no non-linear mapping per residual block.
In the original tensorflow implementation by the author, the time embedding is first passed through a nonlinearity and only then through a linear layer:
https://github.com/hojonathanho/diffusion/blob/1e0dceb3b3495bbe19116a5e1b3596cd0706c543/diffusion_tf/models/unet.py#L49