commit message

osanseviero · Feb 7, 2023 · 1a8e09e · 1a8e09e
commit 1a8e09e
Show file tree

Hide file tree

Showing 5 changed files with 62 additions and 0 deletions.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2023 pix2pixzero
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,41 @@
+# pix2pix-zero [diffusers]
+
+### [website](https://pix2pixzero.github.io/)
+
+## Code and Demo coming soon
+
+
+<br>
+<div class="gif">
+<p align="center">
+<img src='assets/main.gif' align="center">
+</p>
+</div>
+
+
+Our method, [pix2pix-zero](https://pix2pixzero.github.io/), enables the use of text-to-image diffusion models, such as [Stable Diffusion](https://github.com/CompVis/stable-diffusion), for editing images without the need for finetuning. This is achieved through cross-attention guidance during the sampling process, ensuring adherence of the output image's structure to the input. Additionally, our approach allows for the editing of images through pre-computed edit directions, eliminating the requirement for sentence modifications.
+
+## Results
+All our results are based on [stable-diffusion-v1-4](https://github.com/CompVis/stable-diffusion) model. Please the website for more results.
+
+<div>
+<p align="center">
+<img src='assets/results_teaser.jpg' align="center">
+</p>
+</div>
+
+
+## Method Details
+
+Given an input image, we first generate text captions using [BLIP](https://github.com/salesforce/LAVIS) and apply regularized DDIM inversion to obtain our inverted noise map.
+Then, we obtain reference cross-attention maps that correspoind to the structure of the input image by denoising, guided with the CLIP embeddings 
+of our generated text (c). Next, we denoise with edited text embeddings, while enforcing a loss to match current cross-attention maps with the 
+reference cross-attention maps.
+
+<div>
+<p align="center">
+<img src='assets/method.jpg' align="center" width=900>
+</p>
+</div>
+
+
diff --git a/assets/main.gif b/assets/main.gif
diff --git a/assets/method.jpg b/assets/method.jpg
diff --git a/assets/results_teaser.jpg b/assets/results_teaser.jpg