From 92cad23b88ba566ad96afb491976ae9352fa4b52 Mon Sep 17 00:00:00 2001
From: Phil Wang <lucidrains@gmail.com>
Date: Thu, 29 Sep 2022 11:02:05 -0700
Subject: [PATCH] readme

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 8a077c3..e496be2 100644
--- a/README.md
+++ b/README.md
@@ -6,7 +6,7 @@ Implementation of <a href="https://makeavideo.studio/">Make-A-Video</a>, new SOT
 
 The pseudo-3d convolutions isn't a new concept. It has been explored before in other contexts, say for protein contact prediction as <a href="https://www.biorxiv.org/content/10.1101/2022.08.04.502748v2.full">"dimensional hybrid residual networks"</a>.
 
-The gist of the paper comes down to, take a SOTA image model (here they use DALL-E2, but the same learning points would easily apply to Imagen), make a few minor modifications for <a href="https://arxiv.org/abs/2204.03458">attention across time</a> and other ways to skimp on the compute cost, do frame interpolation correctly, get a great video model out.
+The gist of the paper comes down to, take a SOTA text-to-image model (here they use DALL-E2, but the same learning points would easily apply to Imagen), make a few minor modifications for <a href="https://arxiv.org/abs/2204.03458">attention across time</a> and other ways to skimp on the compute cost, do frame interpolation correctly, get a great video model out.
 
 ## Citations