Memory-efficient attention and gradio mask fixed #117

neonsecret · 2022-09-04T14:39:20Z

No description provided.

(cherry picked from commit ddde264)

MrLavender · 2022-09-04T17:09:53Z

Nice work. Applying the attention.py change to the original SD lets me do 512x512 on 8GB, previously could only do 448x448.

But (on the original SD anyway) the size of sim is 16 so sim[8:] and sim[:8] is more memory efficient (makes the difference between it working or failing with out-of-memory). A more general way to do this would be;

half = int(sim.size(dim=0) / 2)
sim[:half] = sim[:half].softmax(dim=-1)
sim[half:] = sim[half:].softmax(dim=-1)

or for maximum memory efficiency (with about 1% performance difference for me);

for i in range(sim.size(dim=0)):
    sim[i] = sim[i].softmax(dim=-1)

Doggettx · 2022-09-04T17:50:39Z

I've found a way to split up the einsum too, can go to insane resolutions on my card now... Might be a better way to do this, my knowledge of torch and python is very limited (meaning almost 0 ;)

Also not quite sure if all the deletes are really needed, no idea when the garbage collector triggers for unused tensors, but guess can't hurt to force it.

def forward(self, x, context=None, mask=None):
    h = self.heads

    q = self.to_q(x)
    context = default(context, x)
    k = self.to_k(context)
    v = self.to_v(context)
    del context, x

    q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h=h), (q, k, v))

    r1 = torch.zeros(q.shape[0], q.shape[1], v.shape[2], device=q.device)
    for i in range(0, q.shape[0], 4):
        end = i + 4
        s1 = einsum('b i d, b j d -> b i j', q[i:end], k[i:end])
        s1 *= self.scale

        s2 = s1.softmax(dim=-1)
        del s1

        r1[i:end] = einsum('b i j, b j d -> b i d', s2, v[i:end])
        del s2

    r2 = rearrange(r1, '(b h) n d -> b n (h d)', h=h)
    del r1

    return self.to_out(r2)

neonsecret · 2022-09-04T17:52:10Z

I've found a way to split up the einsum too, can go to insane resolutions on my card now... Might be a better way to do this, my knowledge of torch and python is very limited (meaning almost 0 ;)

Also not quite sure if all the deletes are really needed, no idea when the garbage collector triggers for unused tensors, but guess can't hurt to force it.
def forward(self, x, context=None, mask=None):
    h = self.heads

    q = self.to_q(x)
    context = default(context, x)
    k = self.to_k(context)
    v = self.to_v(context)
    del context, x

    q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h=h), (q, k, v))

    r1 = torch.zeros(q.shape[0], q.shape[1], v.shape[2], device=q.device)
    for i in range(0, q.shape[0], 4):
        end = i + 4
        s1 = einsum('b i d, b j d -> b i j', q[i:end], k[i:end])
        s1 *= self.scale

        s2 = s1.softmax(dim=-1)
        del s1

        r1[i:end] = einsum('b i j, b j d -> b i d', s2, v[i:end])
        del s2

    r2 = rearrange(r1, '(b h) n d -> b n (h d)', h=h)
    del r1

    return self.to_out(r2)

it won't work, you are only multiplying parts and the whole tensor, the tensor for einsum shouldn't be split

Doggettx · 2022-09-04T17:52:48Z

I've found a way to split up the einsum too, can go to insane resolutions on my card now... Might be a better way to do this, my knowledge of torch and python is very limited (meaning almost 0 ;)
Also not quite sure if all the deletes are really needed, no idea when the garbage collector triggers for unused tensors, but guess can't hurt to force it.
def forward(self, x, context=None, mask=None):
    h = self.heads

    q = self.to_q(x)
    context = default(context, x)
    k = self.to_k(context)
    v = self.to_v(context)
    del context, x

    q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h=h), (q, k, v))

    r1 = torch.zeros(q.shape[0], q.shape[1], v.shape[2], device=q.device)
    for i in range(0, q.shape[0], 4):
        end = i + 4
        s1 = einsum('b i d, b j d -> b i j', q[i:end], k[i:end])
        s1 *= self.scale

        s2 = s1.softmax(dim=-1)
        del s1

        r1[i:end] = einsum('b i j, b j d -> b i d', s2, v[i:end])
        del s2

    r2 = rearrange(r1, '(b h) n d -> b n (h d)', h=h)
    del r1

    return self.to_out(r2)
it won't work, you are only multiplying parts and the whole tensor, the tensor for einsum shouldn't be split

Seems to work fine, gives same results, I have no idea how einsum works though, but as far as I can see there are no side effects

neonsecret · 2022-09-04T17:55:06Z

and memory?

Doggettx · 2022-09-04T17:57:04Z

and memory?

I went from being able to do 1920x640 to 1920x832, it's about 1/4th for the einsum now, I don't have any other optimizations though, only this one (from the compvis version)

neonsecret · 2022-09-04T18:27:14Z

hmm very weird

neonsecret · 2022-09-04T18:37:36Z

fucking hell it works

Doggettx · 2022-09-04T18:47:15Z

It actually works with steps of 2 as well, I can go to 1920x1024 then, it breaks at steps of 1, no idea how this stuff works hehe

Doggettx · 2022-09-04T18:53:30Z

Does seem to make it slower though

Doggettx · 2022-09-04T18:59:58Z

For comparison, I tested the same prompt/seed/settings etc.
at different step sizes:

8 - 7.0 it/s
4 - 6.2 it/s
2 - 4.7 it/s

the drop from 8 to 4 isn't too bad, but not sure if to 2 is worth it. Unless you want to render really high

neonsecret · 2022-09-04T19:04:40Z

4 doesnt seem to make any difference for me
I'm going to add both options

victorbessa96 · 2022-09-04T19:33:03Z

It would be great to have option to decide between faster renders or really high resolution, so perhaps an option to switch between 8 and 2?

JohnAlcatraz · 2022-09-04T21:17:48Z

I've found a way to split up the einsum too, can go to insane resolutions on my card now... Might be a better way to do this, my knowledge of torch and python is very limited (meaning almost 0 ;)

@Doggettx Wow, your code works amazingly well!

I can not see any significant slowdown, it works great even using a step amount of 1 in the for loop. I did also check that the output from the same seed is fully identical.

This is the speed I'm getting when measuring generating a 512x512 image, using a RTX 2070 Super:

Default SD: 5.0 it/s | 0.39 Megapixels Max Res
Your modified def forward with loop steps of 8: 4.94 it/s | Didn't test Max Res
Your modified def forward with loop steps of 4: 4.87 it/s | 0.79 Megapixels Max Res
Your modified def forward with loop steps of 2: 4.78 it/s | 1.14 Megapixels Max Res
Your modified def forward with loop steps of 1: 4.46 it/s | 1.5 Megapixels Max Res

The resolution I can do with a for-loop steps amount of 1 is incredible. It's fully worth the very small reduction in speed. But ideally, the amount of loop steps would be made a command line option that can be set.

So this is the code I'm using for a loop step amount of 1:

    def forward(self, x, context=None, mask=None):
        h = self.heads

        q = self.to_q(x)
        context = default(context, x)
        k = self.to_k(context)
        v = self.to_v(context)
        del context, x

        q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h=h), (q, k, v))

        r1 = torch.zeros(q.shape[0], q.shape[1], v.shape[2], device=q.device)
        for i in range(0, q.shape[0], 1):
            end = i + 1
            s1 = einsum('b i d, b j d -> b i j', q[i:end], k[i:end])
            s1 *= self.scale

            s2 = s1.softmax(dim=-1)
            del s1

            r1[i:end] = einsum('b i j, b j d -> b i d', s2, v[i:end])
            del s2

        r2 = rearrange(r1, '(b h) n d -> b n (h d)', h=h)
        del r1

        return self.to_out(r2)

With this code, I can do 1216x1216 on 8 GB VRAM. That is 4.4 times as many pixels compared to the maximum I can do with default SD. It's amazing!

To be clear, I did my testing above with default SD at half precision, not with the "optimized" version from this repo, so I was comparing default SD at half precision vs only the changed attention.py. With the other optimizations from this repo, I could surely go even higher than 1216x1216 on 8 GB VRAM now. But the other optimizations from this repo hurt speed a lot more, so I think they are not really worth doing any more now.

TheEnhas · 2022-09-04T21:51:15Z

How does this translate into doing batches of images though? One thing I tend to do is 20 512x512 50 step generations with turbo mode, how is VRAM use with half precision + the "loop step 1" code above on base SD compared to that? Because if it's much better or even comparable than yeah, the old optimizations shouldn't really be used anymore except maybe to have as an option to save even more on VRAM-limited (ie. 4GB or less) GPUs, or for really big images.

JohnAlcatraz · 2022-09-04T22:01:44Z

I noticed that the "step 1" version does not actually work for me too - I didn't pay attention to what exactly the log showed. I thought it run through to 100% and succeeded, but what it's actually doing is it runs through to 100%, but then crashes with an out of memory error at high resolutions. Lower resolutions work fine in the "step 1" code without crashes, but then I can also use the "step 2" version with a slightly higher speed.

There's probably some other code somewhere that needs to be optimized more for the "step 1" version to make sense and not crash at 100%.

So what I said above regarding "step 1" clearly being the best is not true. It's "step 2" that's the best because that actually works. The table I showed above is still accurate, just ignore the "loop steps of 1" row.

The maximum I can do now with 8 GB VRAM, using the "step 2" code, is 1.14 Megapixels, as mentioned in my previous comment. A factor of 2.91 improvement over default SD.

So this code:

    def forward(self, x, context=None, mask=None):
        h = self.heads

        q = self.to_q(x)
        context = default(context, x)
        k = self.to_k(context)
        v = self.to_v(context)
        del context, x

        q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h=h), (q, k, v))

        r1 = torch.zeros(q.shape[0], q.shape[1], v.shape[2], device=q.device)
        for i in range(0, q.shape[0], 2):
            end = i + 2
            s1 = einsum('b i d, b j d -> b i j', q[i:end], k[i:end])
            s1 *= self.scale

            s2 = s1.softmax(dim=-1)
            del s1

            r1[i:end] = einsum('b i j, b j d -> b i d', s2, v[i:end])
            del s2

        r2 = rearrange(r1, '(b h) n d -> b n (h d)', h=h)
        del r1

        return self.to_out(r2)

JohnAlcatraz · 2022-09-04T22:06:09Z

How does this translate into doing batches of images though? One thing I tend to do is 20 512x512 50 step generations with turbo mode, how is VRAM use with half precision + the "loop step 1" code above on base SD compared to that? Because if it's much better or even comparable than yeah, the old optimizations shouldn't really be used anymore except maybe to have as an option to save even more on VRAM-limited (ie. 4GB or less) GPUs, or for really big images.

Not sure what exactly you're asking about? I can of course still set --n_iter 20 and then it generates 20 images, the amount of images that are generated does not affect VRAM usage. What does affect VRAM usage is --n_samples, but I think there is no reason to ever have that higher than 1.

Doggettx · 2022-09-04T22:10:50Z

Default SD: 5.0 it/s | 0.39 Megapixels Max Res

Your modified def forward with loop steps of 8: 4.94 it/s | Didn't test Max Res

Your modified def forward with loop steps of 4: 4.87 it/s | 0.79 Megapixels Max Res

Your modified def forward with loop steps of 2: 4.78 it/s | 1.14 Megapixels Max Res

Your modified def forward with loop steps of 1: 4.46 it/s | 1.5 Megapixels Max Res

@JohnAlcatraz So weird to me that you see almost no difference, your steps 1 is actually faster than on my 3090, just wondering what OS are you using? and which version of torch?
I'm running it in windows 11 with torch 1.12.1+cu116. Wonder if that can make a difference, I'm just running the default SD as well with some custom modifications but those have nothing to do with the rendering part.

JohnAlcatraz · 2022-09-04T22:20:58Z

@Doggettx I'm on Windows 10, 21H1. If I'd knew which version of torch I'm using I'd tell you, but I have no idea how to check that, I'm a C++ programmer with no clue about Python ;) I'm not usually doing anything with torch, only installed it for Stable Diffusion. So probably a very new version.

Maybe you are not running at half precision? That is a difference how I run it compared to fully default SD. Just adding that model.half(). Most forks by now do that by default.

Doggettx · 2022-09-04T22:37:54Z

I checked that to be sure, my model was running at full still, set that at half but doesn't really effect speed, it just allowed me to render at even higher res now (1920x1536 with only this change).

Think I'll just make it configurable in my version, for higher resolutions the speed difference seems to get less, but at low resolutions it's more than twice as slow and not really needed.

My workflow is usually first rendering with one dimension at 512 (so 512x768 or something) with a normal upscaler and then img2img the upscaled version at native res. Keeps coherence high while still allowing to render native at high resolutions. But then it's nicer if you can pump out those low res images fast to find a good one ;)

…on#117

MrLavender · 2022-09-05T00:03:08Z

The optimization work done here in the last few hours really is awesome. Thank you all!

I know nothing about Machine Learning and never heard of an einsum before today but looking at the pytorch docs I see this interesting note;

This function does not optimize the given expression, so a different formula for the same computation may run faster or consume less memory. Projects like opt_einsum (https://optimized-einsum.readthedocs.io/en/stable/) can optimize the formula for you.

https://pytorch.org/docs/stable/generated/torch.einsum.html

So maybe there are further improvements to be had in this forward() function (in speed if not memory)?

willlllllio · 2022-09-05T01:45:42Z

This is crazy, with step=2 I can do 1088x1024 on a 6GB card with no noticeable extra slowdown, though I do need the cuda max_split arg for that res.

7flash · 2022-09-05T02:22:26Z

The only noticable optimization in this PR in these lines, halving of attention, but what does actually mean?

        sim[4:] = sim[4:].softmax(dim=-1)
        sim[:4] = sim[:4].softmax(dim=-1)

Seems like applying softmax separately to each half of array? Does it make it faster?

__
Summary: ❓ I have a question | Tags: Efficient

CaptnSeraph · 2022-09-05T08:28:53Z

The second step works for me, helped me push my 8gb 1070 to 896x896

@willlllllio where do you specify the max_split? i assume you mean PYTORCH_CUDA_ALLOC_CONF but which file should that go into or do i need to type it each time as an environment variable.

also, what would be the ideal max size to set for a card with 8192mb?

JohnAlcatraz · 2022-09-05T17:57:04Z

It seems like the original PR version was merged, which gives a lot less VRAM savings than the new optimization code by @Doggettx later figured out in this thread.

basujindal · 2022-09-05T18:02:10Z

It seems like the original PR version was merged, which gives a lot less VRAM savings than the new optimization code by @Doggettx later figured out in this thread.

Is there a PR request for the optimization discussed here?

JohnAlcatraz · 2022-09-05T18:05:14Z

No, no one made a new PR for it yet.

You can see the exact changes in the best way implemented in this branch by @Doggettx : https://github.com/Doggettx/stable-diffusion/commits/main

I don't know if he intends to open a PR himself with them?

ryudrigo · 2022-09-05T18:14:35Z

I just opened a PR, but it was just about my comment -- there might be other optimizations I didn`t look at

camenduru · 2022-09-05T18:15:19Z

1 step 1216x1216 on 8 GB VRAM with 1070 O8G 🎉 Thank You, Everyone.

JohnAlcatraz · 2022-09-05T18:21:48Z

1 step 1216x1216 on 8 GB VRAM with 1070 O8G 🎉 Thank You, Everyone.

If you mean you are using the code shown here with 1 step, you likely see it crash at 100%. But with the newest version of the optimization from @Doggettx, you will likely be able to successfully go that high or even higher.

camenduru · 2022-09-05T19:02:29Z

@JohnAlcatraz Yes, step 1

Now I just changed these two

https://raw.githubusercontent.com/Doggettx/stable-diffusion/main/ldm/modules/diffusionmodules/model.py
https://raw.githubusercontent.com/Doggettx/stable-diffusion/main/ldm/modules/attention.py

1920x1088 with 1070 O8G 1034.58s/it https://i.imgur.com/CbIfbHp.png 🎉🎉🎉

JohnAlcatraz · 2022-09-05T19:04:02Z

1920x1088 with 1070 O8G 1034.58s/it https://i.imgur.com/CbIfbHp.png 🎉🎉🎉

1920x1088 on 8 GB VRAM is certainly impressive!

ryudrigo · 2022-09-05T20:09:08Z

There, polished it a little bit more. Now 1024px in turbo mode takes 8117 MB and 90 seconds (total) for me.

jimovonz · 2022-09-05T20:10:18Z

Anyone else finding that with increased resolution, the images are loosing coherence with multiple random occurances of the subject elements?

JohnAlcatraz · 2022-09-05T20:11:07Z

Anyone else finding that with increased resolution, the images are loosing coherence with multiple random occurances of the subject elements?

That is a known issue with stable diffusion, yes. The model was trained at 512x512 so that's the only resolution it can do very well.

jimovonz · 2022-09-05T20:13:48Z

Anyone else finding that with increased resolution, the images are loosing coherence with multiple random occurances of the subject elements?

That is a known issue with stable diffusion, yes. The model was trained at 512x512 so that's the only resolution it can do very well.

Unfortunately this seems to make most of these higher resolution images useless - unless of course you are specifically after something more abstract....

JohnAlcatraz · 2022-09-05T20:23:12Z

Unfortunately this seems to make most of these higher resolution images useless - unless of course you are specifically after something more abstract....

These optimizations are not just about being able to generate larger resolutions, but also about being able to generate the same resolution on a lower amount of VRAM, making Stable Diffusion more accessible to people with low VRAM GPUs.

ryudrigo · 2022-09-05T20:34:31Z

Indeed! I should've talked about the normal setting. Least memory usage I can get with PR #122 for 512x512 is just under 3GB VRAM

CaptnSeraph · 2022-09-06T11:06:05Z

Unfortunately this seems to make most of these higher resolution images useless - unless of course you are specifically after something more abstract....

As the img2img uses the txt2img sequence (I think) you can use lower res within txt2img to get a good seed and a good "thumbnail" and then refine larger with img2img before running through goBig and gfpgan for serious high quality and sizes (I've got photorealism at DSLR resolutions)

jimovonz · 2022-09-06T19:12:19Z

Cheers - I have been doing something similar with great results. I have been creating lower res images at 768x448 which seem to be mostly free of any obvious duplication/unwanted artifacts and then upscaling using ESRGAN up to 1920x1088 before adding more detail back in using img2img. The strength parameter needs fine tuning to get the right balance of detail - too high and you reintroduce all the same issues you were trying to avoid in the first place. 0.5 is mostly ok but sometimes you need to go lower and sometimes you can go higher with good results.

…

On Tue, 6 Sep 2022, 11:06 pm theseraphim, ***@***.***> wrote: Unfortunately this seems to make most of these higher resolution images useless - unless of course you are specifically after something more abstract.... As the img2img uses the txt2img sequence (I think) you can use lower res within txt2img to get a good seed and a good "thumbnail" and then refine larger with img2img before running through goBig and gfpgan for serious high quality and sizes (I've got photorealism at DSLR resolutions) — Reply to this email directly, view it on GitHub <#117 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLKIJ33PRPGOOGGXNUC353V44QSTANCNFSM6AAAAAAQELKX7Y> . You are receiving this because you commented.Message ID: ***@***.***>

GordonFreeeman · 2022-09-07T08:25:09Z

Holy crap, this is actually working! I'm only a casual when it comes to python, or coding in general, but after fiddling with the above tweaks/fixes, I can generate incredibly high resolutions on my measly 6GB 1660 Ti (laptop card). Plus I have to run at full precision, because fp16 is broken exclusively on 1660 series cards.

512x512: 1.55 it/s
1024x576: 2.21 s/it
1024x1024: 6.36 s/it
1280x768: 6.51 s/it
1408x768: 7.59 s/it
1920x576: 7.74 s/it
1536x960 was working (13.80 s/it), but crashed during image export, when VRAM usage went from 5.4 GB to >6 GB

I ran all tests with 50 ddim_steps. Had to restart twice because for some reason, VRAM wasn't cleared up completely sometimes, and at higher res, it's going a little crazy with the iteration time. But it's still pretty mindblowing as a proof of concept.

basujindal/stable-diffusion#117

@Doggettx

…aboration incorporating a lot of people's contributions -- including for example @Doggettx and the original code from @neonsecret on which the Doggetx optimizations were based (see invoke-ai/InvokeAI#431, https://github.com/sd-webui/stable-diffusion-webui/pull/771\#issuecomment-1239716055). Takes exactly the same amount of time to run 8 steps as original CompVis code does (10.4 secs, ~1.25s/it).

@neonsecret

* start refactoring -not yet functional * first phase of refactor done - not sure weighted prompts working * Second phase of refactoring. Everything mostly working. * The refactoring has moved all the hard-core inference work into ldm.dream.generator.*, where there are submodules for txt2img and img2img. inpaint will go in there as well. * Some additional refactoring will be done soon, but relatively minor work. * fix -save_orig flag to actually work * add @neonsecret attention.py memory optimization * remove unneeded imports * move token logging into conditioning.py * add placeholder version of inpaint; porting in progress * fix crash in img2img * inpainting working; not tested on variations * fix crashes in img2img * ported attention.py memory optimization basujindal#117 from basujindal branch * added @torch_no_grad() decorators to img2img, txt2img, inpaint closures * Final commit prior to PR against development * fixup crash when generating intermediate images in web UI * rename ldm.simplet2i to ldm.generate * add backward-compatibility simplet2i shell with deprecation warning * add back in mps exception, addresses @Vargol comment in CompVis#354 * replaced Conditioning class with exported functions * fix wrong type of with_variations attribute during intialization * changed "image_iterator()" to "get_make_image()" * raise NotImplementedError for calling get_make_image() in parent class * Update ldm/generate.py better error message Co-authored-by: Kevin Gibbons <bakkot@gmail.com> * minor stylistic fixes and assertion checks from code review * moved get_noise() method into img2img class * break get_noise() into two methods, one for txt2img and the other for img2img * inpainting works on non-square images now * make get_noise() an abstract method in base class * much improved inpainting Co-authored-by: Kevin Gibbons <bakkot@gmail.com>

@blessedcoolant

commit 1c649e4 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Mon Sep 12 13:29:16 2022 -0400 fix torchvision dependency version CompVis#511 commit 4d197f6 Merge: a3e07fb 190ba78 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Mon Sep 12 07:29:19 2022 -0400 Merge branch 'development' of github.com:lstein/stable-diffusion into development commit a3e07fb Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Mon Sep 12 07:28:58 2022 -0400 fix grid crash commit 9fa1f31 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Mon Sep 12 07:07:05 2022 -0400 fix opencv and realesrgan dependencies in mac install commit 190ba78 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Mon Sep 12 01:50:58 2022 -0400 Update requirements-mac.txt Fixed dangling dash on last line. commit 25d9ccc Author: Any-Winter-4079 <50542132+Any-Winter-4079@users.noreply.github.com> Date: Mon Sep 12 03:17:29 2022 +0200 Update model.py commit 9cdf3ac Author: Any-Winter-4079 <50542132+Any-Winter-4079@users.noreply.github.com> Date: Mon Sep 12 02:52:36 2022 +0200 Update attention.py Performance improvements to generate larger images in M1 CompVis#431 Update attention.py Added dtype=r1.dtype to softmax commit 49a96b9 Author: Mihai <299015+mh-dm@users.noreply.github.com> Date: Sat Sep 10 16:58:07 2022 +0300 ~7% speedup (1.57 to 1.69it/s) from switch to += in ldm.modules.attention. (CompVis#482) Tested on 8GB eGPU nvidia setup so YMMV. 512x512 output, max VRAM stays same. commit aba94b8 Author: Niek van der Maas <mail@niekvandermaas.nl> Date: Fri Sep 9 15:01:37 2022 +0200 Fix macOS `pyenv` instructions, add code block highlight (CompVis#441) Fix: `anaconda3-latest` does not work, specify the correct virtualenv, add missing init. commit aac5102 Author: Henry van Megen <h.vanmegen@gmail.com> Date: Thu Sep 8 05:16:35 2022 +0200 Disabled debug output (CompVis#436) Co-authored-by: Henry van Megen <hvanmegen@gmail.com> commit 0ab5a36 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 17:19:46 2022 -0400 fix missing lines in outputs commit 5e43372 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 16:20:14 2022 -0400 upped max_steps in v1-finetune.yaml and fixed TI docs to address CompVis#493 commit 7708f4f Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 16:03:37 2022 -0400 slight efficiency gain by using += in attention.py commit b86a1de Author: blessedcoolant <54517381+blessedcoolant@users.noreply.github.com> Date: Mon Sep 12 07:47:12 2022 +1200 Remove print statement styling (CompVis#504) Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com> commit 4951e66 Author: chromaticist <mhostick@gmail.com> Date: Sun Sep 11 12:44:26 2022 -0700 Adding support for .bin files from huggingface concepts (CompVis#498) * Adding support for .bin files from huggingface concepts * Updating documentation to include huggingface .bin info commit 79b445b Merge: a323070 f7662c1 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 15:39:38 2022 -0400 Merge branch 'development' of github.com:lstein/stable-diffusion into development commit a323070 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 15:28:57 2022 -0400 update requirements for new location of gfpgan commit f7662c1 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 15:00:24 2022 -0400 update requirements for changed location of gfpgan commit 93c242c Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 14:47:58 2022 -0400 make gfpgan_model_exists flag available to web interface commit c7c6cd7 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 14:43:07 2022 -0400 Update UPSCALE.md New instructions needed to accommodate fact that the ESRGAN and GFPGAN packages are now installed by environment.yaml. commit 77ca83e Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 14:31:56 2022 -0400 Update CLI.md Final documentation tweak. commit 0ea145d Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 14:29:26 2022 -0400 Update CLI.md More doc fixes. commit 162285a Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 14:28:45 2022 -0400 Update CLI.md Minor documentation fix commit 37c921d Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 14:26:41 2022 -0400 documentation enhancements commit 4f72cb4 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 13:05:38 2022 -0400 moved the notebook files into their own directory commit 878ef2e Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 12:58:06 2022 -0400 documentation tweaks commit 4923118 Merge: 16f6a67 defafc0 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 12:51:25 2022 -0400 Merge branch 'development' of github.com:lstein/stable-diffusion into development commit defafc0 Author: Dominic Letz <dominic@diode.io> Date: Sun Sep 11 18:51:01 2022 +0200 Enable upscaling on m1 (CompVis#474) commit 16f6a67 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 12:47:26 2022 -0400 install GFPGAN inside SD repository in order to fix 'dark cast' issue basujindal#169 commit 0881d42 Author: blessedcoolant <54517381+blessedcoolant@users.noreply.github.com> Date: Mon Sep 12 03:52:43 2022 +1200 Docs Update (CompVis#466) Authored-by: @blessedcoolant Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com> commit 9a29d44 Author: Gérald LONLAS <gerald@lonlas.com> Date: Sun Sep 11 23:23:18 2022 +0800 Revert "Add 3x Upscale option on the Web UI (CompVis#442)" (CompVis#488) This reverts commit f8a5408. commit d301836 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 10:52:19 2022 -0400 can select prior output for init_img using -1, -2, etc commit 70aa674 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 10:34:06 2022 -0400 merge PR CompVis#495 - keep using float16 in ldm.modules.attention commit 8748370 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 10:22:32 2022 -0400 negative -S indexing recovers correct previous seed; closes issue CompVis#476 commit 839e30e Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 11 10:02:44 2022 -0400 improve CUDA VRAM monitoring extra check that device==cuda before getting VRAM stats commit bfb2781 Author: tildebyte <337875+tildebyte@users.noreply.github.com> Date: Sat Sep 10 10:15:56 2022 -0400 fix(readme): add note about updating env via conda (CompVis#475) commit 5c43988 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 10 10:02:43 2022 -0400 reduce VRAM memory usage by half during model loading * This moves the call to half() before model.to(device) to avoid GPU copy of full model. Improves speed and reduces memory usage dramatically * This fix contributed by @mh-dm (Mihai) commit 9912270 Merge: 817c4a2 ecc6b75 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 10 09:54:34 2022 -0400 Merge branch 'development' of github.com:lstein/stable-diffusion into development commit 817c4a2 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 10 09:53:27 2022 -0400 remove -F option from normalized prompt; closes CompVis#483 commit ecc6b75 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 10 09:53:27 2022 -0400 remove -F option from normalized prompt commit 723d074 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Fri Sep 9 18:49:51 2022 -0400 Allow ctrl c when using --from_file (CompVis#472) * added ansi escapes to highlight key parts of CLI session * adjust exception handling so that ^C will abort when reading prompts from a file commit 75f633c Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Fri Sep 9 12:03:45 2022 -0400 re-add new logo commit 10db192 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Fri Sep 9 09:26:10 2022 -0400 changes to dogettx optimizations to run on m1 * Author @Any-Winter-4079 * Author @dogettx Thanks to many individuals who contributed time and hardware to benchmarking and debugging these changes. commit c85ae00 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Thu Sep 8 23:57:45 2022 -0400 fix bug which caused seed to get "stuck" on previous image even when UI specified -1 commit 1b5aae3 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Thu Sep 8 22:36:47 2022 -0400 add icon to dream web server commit 6abf739 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Thu Sep 8 22:25:09 2022 -0400 add favicon to web server commit db825b8 Merge: 33874ba afee7f9 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Thu Sep 8 22:17:37 2022 -0400 Merge branch 'deNULL-development' into development commit 33874ba Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Thu Sep 8 22:16:29 2022 -0400 Squashed commit of the following: commit afee7f9 Merge: 6531446 171f8db Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Thu Sep 8 22:14:32 2022 -0400 Merge branch 'development' of github.com:deNULL/stable-diffusion into deNULL-development commit 171f8db Author: Denis Olshin <me@denull.ru> Date: Thu Sep 8 03:15:20 2022 +0300 saving full prompt to metadata when using web ui commit d7e67b6 Author: Denis Olshin <me@denull.ru> Date: Thu Sep 8 01:51:47 2022 +0300 better logic for clicking to make variations commit afee7f9 Merge: 6531446 171f8db Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Thu Sep 8 22:14:32 2022 -0400 Merge branch 'development' of github.com:deNULL/stable-diffusion into deNULL-development commit 6531446 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Thu Sep 8 20:41:37 2022 -0400 work around unexplained crash when timesteps=1000 (CompVis#440) * work around unexplained crash when timesteps=1000 * this fix seems to work commit c33a84c Author: blessedcoolant <54517381+blessedcoolant@users.noreply.github.com> Date: Fri Sep 9 12:39:51 2022 +1200 Add New Logo (CompVis#454) * Add instructions on how to install alongside pyenv (CompVis#393) Like probably many others, I have a lot of different virtualenvs, one for each project. Most of them are handled by `pyenv`. After installing according to these instructions I had issues with ´pyenv`and `miniconda` fighting over the $PATH of my system. But then I stumbled upon this nice solution on SO: https://stackoverflow.com/a/73139031 , upon which I have based my suggested changes. It runs perfectly on my M1 setup, with the anaconda setup as a virtual environment handled by pyenv. Feel free to incorporate these instructions as you see fit. Thanks a million for all your hard work. * Disabled debug output (CompVis#436) Co-authored-by: Henry van Megen <hvanmegen@gmail.com> * Add New Logo Co-authored-by: Håvard Gulldahl <havard@lurtgjort.no> Co-authored-by: Henry van Megen <h.vanmegen@gmail.com> Co-authored-by: Henry van Megen <hvanmegen@gmail.com> Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com> commit f8a5408 Author: Gérald LONLAS <gerald@lonlas.com> Date: Fri Sep 9 01:45:54 2022 +0800 Add 3x Upscale option on the Web UI (CompVis#442) commit 244239e Author: James Reynolds <magnusviri@users.noreply.github.com> Date: Thu Sep 8 05:36:33 2022 -0600 macOS CI workflow, dream.py exits with an error, but the workflow com… (CompVis#396) * macOS CI workflow, dream.py exits with an error, but the workflow completes. * Files for testing Co-authored-by: James Reynolds <magnsuviri@me.com> Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com> commit 711d49e Author: James Reynolds <magnusviri@users.noreply.github.com> Date: Thu Sep 8 05:35:08 2022 -0600 Cache model workflow (CompVis#394) * Add workflow that caches the model, step 1 for CI * Change name of workflow job Co-authored-by: James Reynolds <magnsuviri@me.com> Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com> commit 7996a30 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Thu Sep 8 07:34:03 2022 -0400 add auto-creation of mask for inpainting (CompVis#438) * now use a single init image for both image and mask * turn on debugging for now to write out mask and image * add back -M option as a fallback commit a69ca31 Author: elliotsayes <elliotsayes@gmail.com> Date: Thu Sep 8 15:30:06 2022 +1200 .gitignore WebUI temp files (CompVis#430) * Add instructions on how to install alongside pyenv (CompVis#393) Like probably many others, I have a lot of different virtualenvs, one for each project. Most of them are handled by `pyenv`. After installing according to these instructions I had issues with ´pyenv`and `miniconda` fighting over the $PATH of my system. But then I stumbled upon this nice solution on SO: https://stackoverflow.com/a/73139031 , upon which I have based my suggested changes. It runs perfectly on my M1 setup, with the anaconda setup as a virtual environment handled by pyenv. Feel free to incorporate these instructions as you see fit. Thanks a million for all your hard work. * .gitignore WebUI temp files Co-authored-by: Håvard Gulldahl <havard@lurtgjort.no> commit 5c6b612 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Wed Sep 7 22:50:55 2022 -0400 fix bug that caused same seed to be redisplayed repeatedly commit 56f155c Author: Johan Roxendal <johan@roxendal.com> Date: Thu Sep 8 04:50:06 2022 +0200 added support for parsing run log and displaying images in the frontend init state (CompVis#410) Co-authored-by: Johan Roxendal <johan.roxendal@litteraturbanken.se> Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com> commit 4168774 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Wed Sep 7 20:24:35 2022 -0400 added missing initialization of latent_noise to None commit 171f8db Author: Denis Olshin <me@denull.ru> Date: Thu Sep 8 03:15:20 2022 +0300 saving full prompt to metadata when using web ui commit d7e67b6 Author: Denis Olshin <me@denull.ru> Date: Thu Sep 8 01:51:47 2022 +0300 better logic for clicking to make variations commit d1d044a Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Wed Sep 7 17:56:59 2022 -0400 actual image seed now written into web log rather than -1 (CompVis#428) commit edada04 Author: Arturo Mendivil <60411196+artmen1516@users.noreply.github.com> Date: Wed Sep 7 10:42:26 2022 -0700 Improve notebook and add requirements file (CompVis#422) commit 29ab3c2 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Wed Sep 7 13:28:11 2022 -0400 disable neonpixel optimizations on M1 hardware (CompVis#414) * disable neonpixel optimizations on M1 hardware * fix typo that was causing random noise images on m1 commit 7670ecc Author: cody <cnmizell@gmail.com> Date: Wed Sep 7 12:24:41 2022 -0500 add more keyboard support on the web server (CompVis#391) add ability to submit prompts with the "enter" key add ability to cancel generations with the "escape" key commit dd2aeda Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Wed Sep 7 13:23:53 2022 -0400 report VRAM usage stats during initial model loading (CompVis#419) commit f628477 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Tue Sep 6 17:12:39 2022 -0400 Squashed commit of the following: commit 7d1344282d942a33dcecda4d5144fc154ec82915 Merge: caf4ea3 ebeb556 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Mon Sep 5 10:07:27 2022 -0400 Merge branch 'development' of github.com:WebDev9000/stable-diffusion into WebDev9000-development commit ebeb556 Author: Web Dev 9000 <rirath@gmail.com> Date: Sun Sep 4 18:05:15 2022 -0700 Fixed unintentionally removed lines commit ff2c4b9 Author: Web Dev 9000 <rirath@gmail.com> Date: Sun Sep 4 17:50:13 2022 -0700 Add ability to recreate variations via image click commit c012929 Author: Web Dev 9000 <rirath@gmail.com> Date: Sun Sep 4 14:35:33 2022 -0700 Add files via upload commit 02a6018 Author: Web Dev 9000 <rirath@gmail.com> Date: Sun Sep 4 14:35:07 2022 -0700 Add files via upload commit eef7889 Author: Olivier Louvignes <olivier@mg-crea.com> Date: Tue Sep 6 12:41:08 2022 +0200 feat(txt2img): allow from_file to work with len(lines) < batch_size (CompVis#349) commit 720e5cd Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Mon Sep 5 20:40:10 2022 -0400 Refactoring simplet2i (CompVis#387) * start refactoring -not yet functional * first phase of refactor done - not sure weighted prompts working * Second phase of refactoring. Everything mostly working. * The refactoring has moved all the hard-core inference work into ldm.dream.generator.*, where there are submodules for txt2img and img2img. inpaint will go in there as well. * Some additional refactoring will be done soon, but relatively minor work. * fix -save_orig flag to actually work * add @neonsecret attention.py memory optimization * remove unneeded imports * move token logging into conditioning.py * add placeholder version of inpaint; porting in progress * fix crash in img2img * inpainting working; not tested on variations * fix crashes in img2img * ported attention.py memory optimization basujindal#117 from basujindal branch * added @torch_no_grad() decorators to img2img, txt2img, inpaint closures * Final commit prior to PR against development * fixup crash when generating intermediate images in web UI * rename ldm.simplet2i to ldm.generate * add backward-compatibility simplet2i shell with deprecation warning * add back in mps exception, addresses @Vargol comment in CompVis#354 * replaced Conditioning class with exported functions * fix wrong type of with_variations attribute during intialization * changed "image_iterator()" to "get_make_image()" * raise NotImplementedError for calling get_make_image() in parent class * Update ldm/generate.py better error message Co-authored-by: Kevin Gibbons <bakkot@gmail.com> * minor stylistic fixes and assertion checks from code review * moved get_noise() method into img2img class * break get_noise() into two methods, one for txt2img and the other for img2img * inpainting works on non-square images now * make get_noise() an abstract method in base class * much improved inpainting Co-authored-by: Kevin Gibbons <bakkot@gmail.com> commit 1ad2a8e Author: thealanle <35761977+thealanle@users.noreply.github.com> Date: Mon Sep 5 17:35:04 2022 -0700 Fix --outdir function for web (CompVis#373) * Fix --outdir function for web * Removed unnecessary hardcoded path commit 52d8bb2 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Mon Sep 5 10:31:59 2022 -0400 Squashed commit of the following: commit 0cd48e932f1326e000c46f4140f98697eb9bdc79 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Mon Sep 5 10:27:43 2022 -0400 resolve conflicts with development commit d7bc8c12e05535a363ac7c745a3f3abc2773bfcf Author: Scott McMillin <scott@scottmcmillin.com> Date: Sun Sep 4 18:52:09 2022 -0500 Add title attribute back to img tag commit 5397c89184ebfb8260bc2d8c3f23e73e103d24e6 Author: Scott McMillin <scott@scottmcmillin.com> Date: Sun Sep 4 13:49:46 2022 -0500 Remove temp code commit 1da080b50972696db2930681a09cb1c14e524758 Author: Scott McMillin <scott@scottmcmillin.com> Date: Sun Sep 4 13:33:56 2022 -0500 Cleaned up HTML; small style changes; image click opens image; add seed to figcaption beneath image commit caf4ea3 Author: Adam Rice <adam@askadam.io> Date: Mon Sep 5 10:05:39 2022 -0400 Add a 'Remove Image' button to clear the file upload field (CompVis#382) * added "remove image" button * styled a new "remove image" button * Update index.js commit 95c088b Author: Kevin Gibbons <bakkot@gmail.com> Date: Sun Sep 4 19:04:14 2022 -0700 Revert "Add CORS headers to dream server to ease integration with third-party web interfaces" (CompVis#371) This reverts commit 91e826e. commit a20113d Author: Kevin Gibbons <bakkot@gmail.com> Date: Sun Sep 4 18:59:12 2022 -0700 put no_grad decorator on make_image closures (CompVis#375) commit 0f93dad Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 4 21:39:15 2022 -0400 fix several dangling references to --gfpgan option, which no longer exists commit f4004f6 Author: tildebyte <337875+tildebyte@users.noreply.github.com> Date: Sun Sep 4 19:43:04 2022 -0400 TOIL(requirements): Split requirements to per-platform (CompVis#355) * toil(reqs): split requirements to per-platform Signed-off-by: Ben Alkov <ben.alkov@gmail.com> * toil(reqs): fix for Win and Lin... ...allow pip to resolve latest torch, numpy Signed-off-by: Ben Alkov <ben.alkov@gmail.com> * toil(install): update reqs in Win install notebook Signed-off-by: Ben Alkov <ben.alkov@gmail.com> Signed-off-by: Ben Alkov <ben.alkov@gmail.com> commit 4406fd1 Merge: 5116c81 fd7a72e Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 4 08:23:53 2022 -0400 Merge branch 'SebastianAigner-main' into development Add support for full CORS headers for dream server. commit fd7a72e Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 4 08:23:11 2022 -0400 remove debugging message commit 3a2be62 Merge: 91e826e 5116c81 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sun Sep 4 08:15:51 2022 -0400 Merge branch 'development' into main commit 5116c81 Author: Justin Wong <1584142+wongjustin99@users.noreply.github.com> Date: Sun Sep 4 07:17:58 2022 -0400 fix save_original flag saving to the same filename (CompVis#360) * Update README.md with new Anaconda install steps (CompVis#347) pip3 version did not work for me and this is the recommended way to install Anaconda now it seems * fix save_original flag saving to the same filename Before this, the `--save_orig` flag was not working. The upscaled/GFPGAN would overwrite the original output image. Co-authored-by: greentext2 <112735219+greentext2@users.noreply.github.com> commit 91e826e Author: Sebastian Aigner <SebastianAigner@users.noreply.github.com> Date: Sun Sep 4 10:22:54 2022 +0200 Add CORS headers to dream server to ease integration with third-party web interfaces commit 6266d9e Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 3 15:45:20 2022 -0400 remove stray debugging message commit 138956e Author: greentext2 <112735219+greentext2@users.noreply.github.com> Date: Sat Sep 3 13:38:57 2022 -0500 Update README.md with new Anaconda install steps (CompVis#347) pip3 version did not work for me and this is the recommended way to install Anaconda now it seems commit 60be735 Author: Cora Johnson-Roberson <cora.johnson.roberson@gmail.com> Date: Sat Sep 3 14:28:34 2022 -0400 Switch to regular pytorch channel and restore Python 3.10 for Macs. (CompVis#301) * Switch to regular pytorch channel and restore Python 3.10 for Macs. Although pytorch-nightly should in theory be faster, it is currently causing increased memory usage and slower iterations: invoke-ai/InvokeAI#283 (comment) This changes the environment-mac.yaml file back to the regular pytorch channel and moves the `transformers` dep into pip for now (since it cannot be satisfied until tokenizers>=0.11 is built for Python 3.10). * Specify versions for Pip packages as well. commit d0d95d3 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 3 14:10:31 2022 -0400 make initimg appear in web log commit b90a215 Merge: 1eee811 6270e31 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 3 13:47:15 2022 -0400 Merge branch 'prixt-seamless' into development commit 6270e31 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 3 13:46:29 2022 -0400 add credit to prixt for seamless circular tiling commit a01b7bd Merge: 1eee811 9d88abe Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 3 13:43:04 2022 -0400 add web interface for seamless option commit 1eee811 Merge: 64eca42 fb857f0 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 3 12:33:39 2022 -0400 Merge branch 'development' of github.com:lstein/stable-diffusion into development commit 64eca42 Merge: 9130ad7 21a1f68 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 3 12:33:05 2022 -0400 Merge branch 'main' into development * brings in small documentation fixes that were added directly to main during release tweaking. commit fb857f0 Author: Lincoln Stein <lincoln.stein@gmail.com> Date: Sat Sep 3 12:07:07 2022 -0400 fix typo in docs commit 9d88abe Author: prixt <paraxite@naver.com> Date: Sat Sep 3 22:42:16 2022 +0900 fixed typo commit a61e49b Author: prixt <paraxite@naver.com> Date: Sat Sep 3 22:39:35 2022 +0900 * Removed unnecessary code * Added description about --seamless commit 02bee4f Author: prixt <paraxite@naver.com> Date: Sat Sep 3 16:08:03 2022 +0900 added --seamless tag logging to normalize_prompt commit d922b53 Author: prixt <paraxite@naver.com> Date: Sat Sep 3 15:13:31 2022 +0900 added seamless tiling mode and commands

…on#117

tzayuan · 2024-04-01T12:08:02Z

Hi @MrLavender,

I would like to ask: SD has loaded a pretrained model, why has the implementation model of attention mechanism been changed, and the pretrained model still works correctly? Are there any techniques and areas to pay attention to in this process? thanks.

Update attention.py

47f8784

neonsecret mentioned this pull request Sep 4, 2022

Memory-efficient attention (also code cleaned up and a colab added) #103

Closed

inpaint gradio mask mode fixed

f1bb248

(cherry picked from commit ddde264)

neonsecret changed the title ~~Memory-efficient attention (single file changed)~~ Memory-efficient attention and gradio mask fixed Sep 4, 2022

AUTOMATIC1111 added a commit to AUTOMATIC1111/stable-diffusion-webui that referenced this pull request Sep 4, 2022

add split attention layer optimization from basujindal/stable-diffusi…

5bb126b

…on#117

JohnAlcatraz mentioned this pull request Sep 4, 2022

Stable Diffusion PR optimizes VRAM, generate 576x1280 images with 6 GB VRAM invoke-ai/InvokeAI#364

Closed

ryudrigo mentioned this pull request Sep 5, 2022

Adapts memory-efficient attention to large unet_bs #122

Merged

Verszy mentioned this pull request Sep 5, 2022

[Bug]: CUDA out of memory Sygil-Dev/sygil-webui#673

Closed

1 task

zaidorx added a commit to zaidorx/stable-diffusion-webui-1 that referenced this pull request Sep 12, 2022

modifying attention.py with optimization from:

988a608

basujindal/stable-diffusion#117

This was referenced Sep 16, 2022

SD upscale limited to 1024x1024? AbdBarho/stable-diffusion-webui-docker#71

Closed

SD upscale limited to 1024x1024 on auto-cpu on an Epyc CPU? AUTOMATIC1111/stable-diffusion-webui#570

Closed

Cabbagec added a commit to Cabbagec/stable-diffusion that referenced this pull request Sep 19, 2022

adapt attention optimization from basujindal#117

a816a4a

harskish mentioned this pull request Sep 20, 2022

Generating 1408x1408 images on my 3080 harskish/sdui#4

Closed

madebyollin mentioned this pull request Oct 13, 2022

Memory Requirements? madebyollin/maple-diffusion#1

Closed

techeng322 pushed a commit to techeng322/stable-diffusion-automatic that referenced this pull request Nov 12, 2023

add split attention layer optimization from basujindal/stable-diffusi…

3ccc942

…on#117

Memory-efficient attention and gradio mask fixed #117

Memory-efficient attention and gradio mask fixed #117

Conversation

neonsecret commented Sep 4, 2022

MrLavender commented Sep 4, 2022

Doggettx commented Sep 4, 2022

neonsecret commented Sep 4, 2022

Doggettx commented Sep 4, 2022 • edited Loading

neonsecret commented Sep 4, 2022

Doggettx commented Sep 4, 2022 • edited Loading

neonsecret commented Sep 4, 2022

neonsecret commented Sep 4, 2022

Doggettx commented Sep 4, 2022

Doggettx commented Sep 4, 2022

Doggettx commented Sep 4, 2022

neonsecret commented Sep 4, 2022

victorbessa96 commented Sep 4, 2022 • edited Loading

JohnAlcatraz commented Sep 4, 2022 • edited Loading

TheEnhas commented Sep 4, 2022 • edited Loading

JohnAlcatraz commented Sep 4, 2022 • edited Loading

JohnAlcatraz commented Sep 4, 2022 • edited Loading

Doggettx commented Sep 4, 2022

JohnAlcatraz commented Sep 4, 2022 • edited Loading

Doggettx commented Sep 4, 2022

MrLavender commented Sep 5, 2022

willlllllio commented Sep 5, 2022

7flash commented Sep 5, 2022 • edited Loading

CaptnSeraph commented Sep 5, 2022

JohnAlcatraz commented Sep 5, 2022 • edited Loading

basujindal commented Sep 5, 2022 • edited Loading

JohnAlcatraz commented Sep 5, 2022 • edited Loading

ryudrigo commented Sep 5, 2022

camenduru commented Sep 5, 2022

JohnAlcatraz commented Sep 5, 2022 • edited Loading

camenduru commented Sep 5, 2022

JohnAlcatraz commented Sep 5, 2022 • edited Loading

ryudrigo commented Sep 5, 2022

jimovonz commented Sep 5, 2022

JohnAlcatraz commented Sep 5, 2022

jimovonz commented Sep 5, 2022

JohnAlcatraz commented Sep 5, 2022

ryudrigo commented Sep 5, 2022

CaptnSeraph commented Sep 6, 2022

jimovonz commented Sep 6, 2022 via email

GordonFreeeman commented Sep 7, 2022 • edited Loading

tzayuan commented Apr 1, 2024

Doggettx commented Sep 4, 2022 •

edited

Loading

Doggettx commented Sep 4, 2022 •

edited

Loading

victorbessa96 commented Sep 4, 2022 •

edited

Loading

JohnAlcatraz commented Sep 4, 2022 •

edited

Loading

TheEnhas commented Sep 4, 2022 •

edited

Loading

JohnAlcatraz commented Sep 4, 2022 •

edited

Loading

JohnAlcatraz commented Sep 4, 2022 •

edited

Loading

JohnAlcatraz commented Sep 4, 2022 •

edited

Loading

7flash commented Sep 5, 2022 •

edited

Loading

JohnAlcatraz commented Sep 5, 2022 •

edited

Loading

basujindal commented Sep 5, 2022 •

edited

Loading

JohnAlcatraz commented Sep 5, 2022 •

edited

Loading

JohnAlcatraz commented Sep 5, 2022 •

edited

Loading

JohnAlcatraz commented Sep 5, 2022 •

edited

Loading

GordonFreeeman commented Sep 7, 2022 •

edited

Loading