Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: VRAM usage is way higher #6307

Closed
1 task done
shimizu-izumi opened this issue Jan 4, 2023 · 35 comments
Closed
1 task done

[Bug]: VRAM usage is way higher #6307

shimizu-izumi opened this issue Jan 4, 2023 · 35 comments
Labels
bug-report Report of a bug, yet to be confirmed

Comments

@shimizu-izumi
Copy link

shimizu-izumi commented Jan 4, 2023

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What happened?

I updated the WebUI a few minutes ago and now the VRAM usage when generating an image is way higher. I have 3 monitors (2x 1920x1080 & 1x 2560x1440), I use Wallpaper Engine on all of them, but I have Discord open on of them nearly 24/7, so Wallpaper Engine is only active for two monitors. 1.5 GB VRAM are used when I am on the Desktop without the WebUI running.
Web Browers: Microsoft Edge (Chromium)
OS: Windows 11 (Build number: 22621.963)
GPU: NVIDIA GeForce RTX 3070 Ti (KFA2)
CPU: Intel Core i7-11700K
RAM: Corsair VENGEANCE LPX 32 GB (2 x 16 GB) DDR4 DRAM 3200 MHz C16

Steps to reproduce the problem

  1. Start the WebUI
  2. Use the following settings to generate an image

Positive prompt:
masterpiece, best quality, 1girl, brown hair, green eyes, colorful, autumn, cumulonimbus clouds, lighting, blue sky, falling leaves, garden
Negative prompt:
lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name,
Steps: 50, Sampler: Euler a, CFG scale: 12, Seed: 3607441108, Size: 512x768, Model hash: 8d9aaa54, Model: Anything V3 (non pruned with vae), Denoising strength: 0.69, Clip skip: 2, Hires upscale: 2, Hires upscaler: R-ESRGAN AnimeVideo

What should have happened?

The generation should complete without any errors

Commit where the problem happens

1cfd8ae

What platforms do you use to access UI ?

Windows

What browsers do you use to access the UI ?

Microsoft Edge

Command Line Arguments

--xformers

Additional information, context and logs

I have the config for animefull from the Novel AI leak in the configs folder under the name Anything V3.0.yaml, but I get this error too when I remove it from the configs folder and completely restart the WebUI. This is the error I get

RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 8.00 GiB total capacity; 4.70 GiB already allocated; 0 bytes free; 5.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@shimizu-izumi shimizu-izumi added the bug-report Report of a bug, yet to be confirmed label Jan 4, 2023
@ClashSAN
Copy link
Collaborator

ClashSAN commented Jan 4, 2023

when did you last update webui? This maybe from a windows update. you may want to disable browser hardware acceleration. I've found openoutpaint extension automatically uses some vram with browser hardware acceleration

@walkerakiz
Copy link

Same issue here, for a simple 5.x5 i cant even use with the normal sd 2.1 model or any upscale. That happened with the new update today. :/

@shimizu-izumi
Copy link
Author

shimizu-izumi commented Jan 4, 2023

when did you last update webui? This maybe from a windows update. you may want to disable browser hardware acceleration. I've found openoutpaint extension automatically uses some vram with browser hardware acceleration

I updated the WebUI around 2 PM UTC+1. The last major Windows update was a few weeks ago. When I used the WebUI a few days ago, everything still worked without any errors, and I don't have the openoutpaint extension.

@mxzgithub
Copy link

mxzgithub commented Jan 4, 2023

I made a fresh install right now with a RTX4090. Running out of VRAM constantly, never happened before.

RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 23.99 GiB total capacity; 12.81 GiB already allocated; 0 bytes free; 21.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

@Alphyn-gunner
Copy link

Alphyn-gunner commented Jan 4, 2023

Denoising strength: 0.69, Clip skip: 2, Hires upscale: 2, Hires upscaler: R-ESRGAN AnimeVideo

I might be mistaken because you, but I think the culprit is the new Highres fix. It upscales the images before processing them for the second time and they may be too big to fit into your VRAM. I see a lot of people complaining about how confusing it to use and how it gives inferior results. In my experience as well it is of a questionable usability right now.

If you really need to use the Highres fix now, try setting the upscaling factor to 1. It somehow makes it behave, even though its counter-intuitive, and the default setting is 2.
Here are some examples I got:
Default settings (upscale by 2):
image
Upscale by 1:
01255-644973770-(extremely detailed CG unity 8k wallpaper), full shot body photo of a (((beautiful badass woman soldier))) with ((white hair)),

On the other hand, I just noticed that you have a lot of ram, so it makes me think I'm completely wrong about my assumption, and there is something else entirely going on. I'm going to try and use your settings with the same model and see what I get on 8 gb.

@Alphyn-gunner
Copy link

Here's the result I got:
`masterpiece, best quality, 1girl, brown hair, green eyes, colorful, autumn, cumulonimbus clouds, lighting, blue sky, falling leaves, garden
Negative prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name,
Steps: 50, Sampler: Euler a, CFG scale: 7, Seed: 3607441108, Size: 512x768, Model: Anything-V3.0-pruned-fp32, Denoising strength: 0.69, Clip skip: 2, Hires upscale: 2, Hires upscaler: R-ESRGAN 4x+ Anime6B

Time taken: 4m 49.25sTorch active/reserved: 4777/6598 MiB, Sys VRAM: 8192/8192 MiB (100.0%)`
image
It used all the available memory, but didn't run out. It also made the image twice the size I ordered and it took me almost 5 minutes on a 1070 ti.

Commit hash: 24d4a08

@shimizu-izumi
Copy link
Author

shimizu-izumi commented Jan 4, 2023

@Alphyn-gunner It's twice the size because of the hires upscale value.

@shimizu-izumi
Copy link
Author

I also noticed that I now get completely different results with the exact same settings.
00017

@ClashSAN
Copy link
Collaborator

ClashSAN commented Jan 4, 2023

I made a fresh install right now with a RTX4090. Running out of VRAM constantly, never happened before.

RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 23.99 GiB total capacity; 12.81 GiB already allocated; 0 bytes free; 21.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

Could you post the before and after image size limit?

I also noticed that I now get completely different results with the exact same settings. 00017

Were you using xformers?

@lolxdmainkaisemaanlu
Copy link

I have the same problem, and I don't even use the hi-res fix! I just do normal gen but the VRAM usage is WAYYYYY higher now! I can't do the same batch size that I used to be able to do previously! Everything else is the same, I changed nothing. It only git pulled..

@Campfirecrucifix
Copy link

Campfirecrucifix commented Jan 5, 2023

Same issue here, for a simple 5.x5 i cant even use with the normal sd 2.1 model or any upscale. That happened with the new update today. :/

I honestly thought I was the only one. Generating images is SOO much slower now(And I have a 4090). I really wish there was a way to revert back to the previous update.

I also noticed that I now get completely different results with the exact same settings. 00017

Also getting the same problem. I was wondering why hires was taking so long now so I decided to recreate one of my previous images and I got nothing like it with all the same settings and it took forever.

@mykeehu
Copy link
Contributor

mykeehu commented Jan 5, 2023

In the latest versions, hires fix have been modified. Do the 5f4fa94 versions also have bugs?

@GarbageHaus
Copy link

GarbageHaus commented Jan 5, 2023

For what it's worth I've also noticed this when training an embedding as of updating today via a fresh install.
I have an old version which doesn't have any issues which was how the repository was as of 11/5. I have a lower end card (RTX 2060 6G) so embeddings are all I can do for the moment.

Previously I could train a 512/512 embedding and use the "Read parameters" option on the SD1.4 checkpoint. The message I get states 512mb additional VRAM is needed. For experimentation, I lowered the 512 values and the embedding began to train. However, when it tried to generate an image mid-training, the CUDA memory issue occurred again.

It is worth noting that I'm able to use regular prompts as well as the embedding that was terminated early after running out of memory. So this might be helpful in determining what the cause is.

@nonetrix
Copy link
Contributor

nonetrix commented Jan 5, 2023

Same here, as suggested using a less extreme upscale option worked. However, it is considerably slower still. having different highers fix back ends is nice and might yield better results, but why is this the only option? Why not add both?

What is the last known commit that doesn't have this change? I think I'll switch back for that in the time being.

@Nilok7
Copy link

Nilok7 commented Jan 5, 2023

The currently Hires. Fix seems to be tuned much more for higher end cards.
It would be very helpful if there was a way to tuned the Hires. Fix to the previous settings, either a direct option or an update to the wiki, for 8GB and lower cards.

@DrGunnarMallon
Copy link

DrGunnarMallon commented Jan 5, 2023

For now you could always checkout a previous version:

git checkout fd4461d

This is the one I'm using for the time being as I find the system pretty much unusable as it is now.

@shimizu-izumi
Copy link
Author

Yes, I use xformers. What do you mean by image size limit?

@nanafy
Copy link

nanafy commented Jan 5, 2023

I have the same issue. Found it while using Hi-res fix. I completely understand how to use it, that's not the issue. Now I run out of vram for the same batch sizes/dimensions as before @lolxdmainkaisemaanlu also pointed out the same except they are not even using hi-res. I just happened to notice it on hi-res. Its an independent issue from hi-res fix it seems. reverting fd4461d as well curtousy to @DrGunnarMallon

@DoughyInTheMiddle
Copy link

For now you could always checkout a previous version:

git checkout fd4461d

This is the one I'm using for the time being as I find the system pretty much unusable as it is now.

I'm running A1111 on a 2060 Super, so 8GB of VRAM.

I had a bit of a workflow to do a couple of 512x512 low-level passes, and then bumped it up to 768 to start getting in detail, finally finishing off and upscaling to 1024. I've been doing passes of this process for almost a week (I've been making daily "Twelve Days of Christmas" images).

Even on my older card, it works. Now, even going from 512 to 768 with just 50 steps it just wrecks. I currently cannot render anything at 768x768.

I tried resetting to the hash recommended above, but I'm still going OOM. Is there another hash to recommend reverting to prior to that?

Error completing request
Arguments: (0, 'a photograph of  a single red apple, on a yellow plate, on a blue checkered tablecloth.', '', 'None', 'None', <PIL.Image.Image image mode=RGBA size=512x512 at 0x1EFB7F20DF0>, None, None, None, None, 0, 50, 0, 4, 0, 1, False, False, 1, 4, 7, 0.2, 1254105237.0, -1.0, 0, 0, 0, False, 768, 768, 0, False, 32, 0, '', '', 0, '<ul>\n<li><code>CFG Scale</code> should be 2 or lower.</li>\n</ul>\n', True, True, '', '', True, 50, True, 1, 0, False, 4, 1, '<p style="margin-bottom:0.75em">Recommended settings: Sampling Steps: 80-100, Sampler: Euler a, Denoising strength: 0.8</p>', 128, 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0, ['left', 'right', 'up', 'down'], False, None, None, '', '', '', '', 'Auto rename', {'label': 'Upload avatars config'}, 'Open outputs directory', 'Export to WebUI style', True, {'label': 'Presets'}, {'label': 'QC preview'}, '', [], 'Select', 'QC scan', 'Show pics', None, False, False, False, False, '', '<p style="margin-bottom:0.75em">Will upscale the image by the selected scale factor; use width and height sliders to set tile size</p>', 64, 0, 2, 'Positive', 0, ', ', True, 32, 1, '', 0, '', True, False, False) {}
Traceback (most recent call last):
  File "G:\GitHub\SDWebUI\modules\call_queue.py", line 45, in f
    res = list(func(*args, **kwargs))
  File "G:\GitHub\SDWebUI\modules\call_queue.py", line 28, in f
    res = func(*args, **kwargs)
  File "G:\GitHub\SDWebUI\modules\img2img.py", line 152, in img2img
    processed = process_images(p)
  File "G:\GitHub\SDWebUI\modules\processing.py", line 471, in process_images
    res = process_images_inner(p)
  File "G:\GitHub\SDWebUI\modules\processing.py", line 541, in process_images_inner
    p.init(p.all_prompts, p.all_seeds, p.all_subseeds)
  File "G:\GitHub\SDWebUI\modules\processing.py", line 887, in init
    self.init_latent = self.sd_model.get_first_stage_encoding(self.sd_model.encode_first_stage(image))
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "G:\GitHub\SDWebUI\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 830, in encode_first_stage
    return self.first_stage_model.encode(x)
  File "G:\GitHub\SDWebUI\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 83, in encode
    h = self.encoder(x)
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "G:\GitHub\SDWebUI\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 526, in forward
    h = self.down[i_level].block[i_block](hs[-1], temb)
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "G:\GitHub\SDWebUI\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 138, in forward
    h = self.norm2(h)
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\nn\modules\normalization.py", line 272, in forward
    return F.group_norm(
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\nn\functional.py", line 2516, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA out of memory. Tried to allocate 1.12 GiB (GPU 0; 8.00 GiB total capacity; 5.29 GiB already allocated; 0 bytes free; 6.53 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@nanafy
Copy link

nanafy commented Jan 5, 2023

4af3ca5 try that one. the other repo was throwing errors for me as well. Currently back up and running like I was before trying to get the latest build.

@DoughyInTheMiddle
Copy link

4af3ca5 try that one. the other repo was throwing errors for me as well. Currently back up and running like I was before trying to get the latest build.

That one isn't working for me either. Still going OOM.

After bashing git checkout xxxxxx, is there anything else I need to do other than to close the console and restart?

@nanafy
Copy link

nanafy commented Jan 6, 2023

When you open your auto1111 cmd, it tells you the commit version as soon as you run the webui.bat
Does it say
Commit hash: 4af3ca5
Installing requirements for Web UI...

@DoughyInTheMiddle
Copy link

I restored back to the master branch and, NVidia just put out a driver update.

One of the two affected things, so at least I'm getting things to work better. Memory usage SEEMS better. Still watching it though for a bit.

@nonetrix
Copy link
Contributor

nonetrix commented Jan 6, 2023

Did you add git pull to your webui script? I've seen a few do that, for me at least reverting back to a old version fixed it for me. Funny because this change made me think xformers was the issue, I guess I'll have to give it another chance I was harsh

@DoctorPavel
Copy link

I'm not sure how related this is, but I haven't seen anybody else mention it.
Loading a model in the webui, including at launch, has a coinflip's chance of maxing out my 8GB vram instantly and freezing my PC entirely. Has anybody else experienced this issue? This has been a thing since a few pulls now, even before the suspension.
I have been running the webui inside a docker image on Ubuntu 20.04 with rocm and an RX 5700 XT AMD card.

@ChinatsuHS
Copy link

Having the same issue with just loading the Webui immediately uses and keeps using 5 out of the 8 GB of VRAM
all since the new hires fix was implemented (most common error it OoM's on has to do with resolution scaling (even with hires fix disabled).. am not using SD2.x models at all so those should not be the issue.

with each generation the amount of VRAM in use seems to increase by a few MB ... (which stacks up fast over time) ... img2img is a no go at all as it immediately OoM's

@ImBadAtNames2019
Copy link

Same issue here.

RuntimeError: CUDA out of memory. Tried to allocate 76.38 GiB (GPU 0; 12.00 GiB total capacity; 2.57 GiB already allocated; 7.19 GiB free; 2.58 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Time taken: 16.44sTorch active/reserved: 2757/2774 MiB, Sys VRAM: 5051/12288 MiB (41.11%)

@Centurion-Rome
Copy link

See possible source in "new hires": #6725

@mykeehu
Copy link
Contributor

mykeehu commented Jan 23, 2023

I do not use Hires Fix, but I can no longer change models on Colab because it causes memory overflow:

image

--lowram, --lowvram and --medvram options no helped. This is the default RAM reservation at startup:

image

Update: I found a solution:

  • set VAE to None
  • under Settings -> Stable Diffusion, set Checkpoints and VAE cache to zero
  • save the settings and shut down SD (GUI restart is not enough!)
  • start again.

image

Regardless, I saw that every time I change the model, it occupies 1 GB more memory, so after a while it causes a memory overflow again.

@Mistborn-First-Era
Copy link

I have this problem as well. It consists of..

  1. when I open the webui my vram is at 5000ish instead of the normal 500ish. This is idle usage
  2. when I switch models or generate multiple picture in which the model switches via x\y\z my memory usage grows steadily until it maxes out.

@LuluViBritannia
Copy link

Hey guys, I got a similar issue : I updated the UI, and for some reason the VRAM usage skyrocketted.
It turned out I had to remove the command lines that starts updates at launch. Literally half of my VRAM (3GB out of 6) was taken from the start of the software, and after removing both command lines ("git pull" and the one line to update torch), the VRAM usage became normal again.

So if you just updated the UI and you're now running out of VRAM, remove the command lines for the updates. Hopefully it helps!

@Nilok7
Copy link

Nilok7 commented Apr 10, 2023

Hey guys, I got a similar issue : I updated the UI, and for some reason the VRAM usage skyrocketted. It turned out I had to remove the command lines that starts updates at launch. Literally half of my VRAM (3GB out of 6) was taken from the start of the software, and after removing both command lines ("git pull" and the one line to update torch), the VRAM usage became normal again.

So if you just updated the UI and you're now running out of VRAM, remove the command lines for the updates. Hopefully it helps!

Which file did you edit?
I don't have any command lines in the webui-user.bat for that, and there isn't any Git Pull or Torch in the webui.bat

@LuluViBritannia
Copy link

Hey guys, I got a similar issue : I updated the UI, and for some reason the VRAM usage skyrocketted. It turned out I had to remove the command lines that starts updates at launch. Literally half of my VRAM (3GB out of 6) was taken from the start of the software, and after removing both command lines ("git pull" and the one line to update torch), the VRAM usage became normal again.
So if you just updated the UI and you're now running out of VRAM, remove the command lines for the updates. Hopefully it helps!

Which file did you edit? I don't have any command lines in the webui-user.bat for that, and there isn't any Git Pull or Torch in the webui.bat

The launcher (the webui-user.bat file). I had put two command lines for the updates, thinking it would only affect the launch, but it was actually taking 3GB VRAM for no reason.

In your case that doesn't seem to be the issue. Sorry I can't help ^^'.

@catboxanon
Copy link
Collaborator

catboxanon commented Aug 13, 2023

I've made two PRs that I think will finally address this. voldy (auto) has also made recent improvements to the dev branch in 0af4127 and ccb9233 that should improve this as well. Basically, if you miss the performance of hires fix in the early days before ef27a18 changed it, I think this now fixes it. Note you should be using --medvram (or --lowvram), not using --no-half-vae, and using a high-performance optimizer like xformers to take the most advantage of these.

#12514
#12515

I also closed #6725 and #7002 since this issue is the most relevant. The former was just asking for old hires fix to be added back (where width/height is specified manually, which is supported) and the latter is technically a duplicate of this issue.

@catboxanon
Copy link
Collaborator

Closing this as I've done a few tests and VRAM usage is significantly lower as of the latest dev branch commit. In the scenario given in OP, VRAM peaks just under 6GB, which fits well within their given criteria. Open a new issue with more specifics if problems still occur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-report Report of a bug, yet to be confirmed
Projects
None yet
Development

No branches or pull requests