Skip to content

Commit

Permalink
research management
Browse files Browse the repository at this point in the history
  • Loading branch information
lucidrains committed Oct 17, 2022
1 parent 3689ecf commit 577a2a6
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The gist of the paper comes down to, take a SOTA text-to-image model (here they
## Install

```bash
$ pip install make-a-video
$ pip install make-a-video-pytorch
```

## Usage
Expand Down Expand Up @@ -49,7 +49,7 @@ conv_out = conv(video) # (1, 256, 8, 16, 16)
attn_out = attn(video) # (1, 256, 8, 16, 16)
```

Passing in images (if one were to pretrain on images first, both temporal convolution and attention will be automatically skipped)
Passing in images (if one were to pretrain on images first), both temporal convolution and attention will be automatically skipped. In other words, you can use this straightforwardly in your 2d Unet and then port it over to a 3d Unet once that phase of the training is done. The temporal modules are initialized to output identity as the paper had done.

```python
import torch
Expand Down Expand Up @@ -103,6 +103,7 @@ attn_out = attn(video, attend_across_time = False) # (1, 256, 8, 16, 16)
- [ ] give attention the best positional embeddings research has to offer
- [ ] soup up the attention
- [ ] offer a function, similar to how MosaicML's approach, that automatically rigs a 2d-unet from dalle2-pytorch to be 3d
- [ ] consider learned exponential moving average across time from https://github.com/lucidrains/Mega-pytorch

## Citations

Expand Down

0 comments on commit 577a2a6

Please sign in to comment.