From 92cad23b88ba566ad96afb491976ae9352fa4b52 Mon Sep 17 00:00:00 2001 From: Phil Wang Date: Thu, 29 Sep 2022 11:02:05 -0700 Subject: [PATCH] readme --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 8a077c3..e496be2 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ Implementation of Make-A-Video, new SOT The pseudo-3d convolutions isn't a new concept. It has been explored before in other contexts, say for protein contact prediction as "dimensional hybrid residual networks". -The gist of the paper comes down to, take a SOTA image model (here they use DALL-E2, but the same learning points would easily apply to Imagen), make a few minor modifications for attention across time and other ways to skimp on the compute cost, do frame interpolation correctly, get a great video model out. +The gist of the paper comes down to, take a SOTA text-to-image model (here they use DALL-E2, but the same learning points would easily apply to Imagen), make a few minor modifications for attention across time and other ways to skimp on the compute cost, do frame interpolation correctly, get a great video model out. ## Citations