[Bug]: VRAM usage is way higher #6307

shimizu-izumi · 2023-01-04T14:30:41Z

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What happened?

I updated the WebUI a few minutes ago and now the VRAM usage when generating an image is way higher. I have 3 monitors (2x 1920x1080 & 1x 2560x1440), I use Wallpaper Engine on all of them, but I have Discord open on of them nearly 24/7, so Wallpaper Engine is only active for two monitors. 1.5 GB VRAM are used when I am on the Desktop without the WebUI running.
Web Browers: Microsoft Edge (Chromium)
OS: Windows 11 (Build number: 22621.963)
GPU: NVIDIA GeForce RTX 3070 Ti (KFA2)
CPU: Intel Core i7-11700K
RAM: Corsair VENGEANCE LPX 32 GB (2 x 16 GB) DDR4 DRAM 3200 MHz C16

Steps to reproduce the problem

Start the WebUI
Use the following settings to generate an image

Positive prompt:
masterpiece, best quality, 1girl, brown hair, green eyes, colorful, autumn, cumulonimbus clouds, lighting, blue sky, falling leaves, garden
Negative prompt:
lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name,
Steps: 50, Sampler: Euler a, CFG scale: 12, Seed: 3607441108, Size: 512x768, Model hash: 8d9aaa54, Model: Anything V3 (non pruned with vae), Denoising strength: 0.69, Clip skip: 2, Hires upscale: 2, Hires upscaler: R-ESRGAN AnimeVideo

What should have happened?

The generation should complete without any errors

Commit where the problem happens

1cfd8ae

What platforms do you use to access UI ?

Windows

What browsers do you use to access the UI ?

Microsoft Edge

Command Line Arguments

--xformers

Additional information, context and logs

I have the config for animefull from the Novel AI leak in the configs folder under the name Anything V3.0.yaml, but I get this error too when I remove it from the configs folder and completely restart the WebUI. This is the error I get

RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 8.00 GiB total capacity; 4.70 GiB already allocated; 0 bytes free; 5.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The text was updated successfully, but these errors were encountered:

ClashSAN · 2023-01-04T16:25:59Z

when did you last update webui? This maybe from a windows update. you may want to disable browser hardware acceleration. I've found openoutpaint extension automatically uses some vram with browser hardware acceleration

walkerakiz · 2023-01-04T16:30:04Z

Same issue here, for a simple 5.x5 i cant even use with the normal sd 2.1 model or any upscale. That happened with the new update today. :/

shimizu-izumi · 2023-01-04T16:54:22Z

when did you last update webui? This maybe from a windows update. you may want to disable browser hardware acceleration. I've found openoutpaint extension automatically uses some vram with browser hardware acceleration

I updated the WebUI around 2 PM UTC+1. The last major Windows update was a few weeks ago. When I used the WebUI a few days ago, everything still worked without any errors, and I don't have the openoutpaint extension.

mxzgithub · 2023-01-04T19:17:11Z

I made a fresh install right now with a RTX4090. Running out of VRAM constantly, never happened before.

RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 23.99 GiB total capacity; 12.81 GiB already allocated; 0 bytes free; 21.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

Alphyn-gunner · 2023-01-04T19:46:02Z

Denoising strength: 0.69, Clip skip: 2, Hires upscale: 2, Hires upscaler: R-ESRGAN AnimeVideo

I might be mistaken because you, but I think the culprit is the new Highres fix. It upscales the images before processing them for the second time and they may be too big to fit into your VRAM. I see a lot of people complaining about how confusing it to use and how it gives inferior results. In my experience as well it is of a questionable usability right now.

If you really need to use the Highres fix now, try setting the upscaling factor to 1. It somehow makes it behave, even though its counter-intuitive, and the default setting is 2.
Here are some examples I got:
Default settings (upscale by 2):

Upscale by 1:

On the other hand, I just noticed that you have a lot of ram, so it makes me think I'm completely wrong about my assumption, and there is something else entirely going on. I'm going to try and use your settings with the same model and see what I get on 8 gb.

Alphyn-gunner · 2023-01-04T20:03:21Z

Here's the result I got:
`masterpiece, best quality, 1girl, brown hair, green eyes, colorful, autumn, cumulonimbus clouds, lighting, blue sky, falling leaves, garden
Negative prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name,
Steps: 50, Sampler: Euler a, CFG scale: 7, Seed: 3607441108, Size: 512x768, Model: Anything-V3.0-pruned-fp32, Denoising strength: 0.69, Clip skip: 2, Hires upscale: 2, Hires upscaler: R-ESRGAN 4x+ Anime6B

Time taken: 4m 49.25sTorch active/reserved: 4777/6598 MiB, Sys VRAM: 8192/8192 MiB (100.0%)`

It used all the available memory, but didn't run out. It also made the image twice the size I ordered and it took me almost 5 minutes on a 1070 ti.

Commit hash: 24d4a08

shimizu-izumi · 2023-01-04T20:54:32Z

@Alphyn-gunner It's twice the size because of the hires upscale value.

shimizu-izumi · 2023-01-04T21:04:50Z

I also noticed that I now get completely different results with the exact same settings.

ClashSAN · 2023-01-04T23:19:31Z

I made a fresh install right now with a RTX4090. Running out of VRAM constantly, never happened before.

RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 23.99 GiB total capacity; 12.81 GiB already allocated; 0 bytes free; 21.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

Could you post the before and after image size limit?

I also noticed that I now get completely different results with the exact same settings.

Were you using xformers?

lolxdmainkaisemaanlu · 2023-01-04T23:22:10Z

I have the same problem, and I don't even use the hi-res fix! I just do normal gen but the VRAM usage is WAYYYYY higher now! I can't do the same batch size that I used to be able to do previously! Everything else is the same, I changed nothing. It only git pulled..

Campfirecrucifix · 2023-01-05T05:31:00Z

Same issue here, for a simple 5.x5 i cant even use with the normal sd 2.1 model or any upscale. That happened with the new update today.

I honestly thought I was the only one. Generating images is SOO much slower now(And I have a 4090). I really wish there was a way to revert back to the previous update.

I also noticed that I now get completely different results with the exact same settings.

Also getting the same problem. I was wondering why hires was taking so long now so I decided to recreate one of my previous images and I got nothing like it with all the same settings and it took forever.

mykeehu · 2023-01-05T05:51:18Z

In the latest versions, hires fix have been modified. Do the 5f4fa94 versions also have bugs?

GarbageHaus · 2023-01-05T05:53:06Z

For what it's worth I've also noticed this when training an embedding as of updating today via a fresh install.
I have an old version which doesn't have any issues which was how the repository was as of 11/5. I have a lower end card (RTX 2060 6G) so embeddings are all I can do for the moment.

Previously I could train a 512/512 embedding and use the "Read parameters" option on the SD1.4 checkpoint. The message I get states 512mb additional VRAM is needed. For experimentation, I lowered the 512 values and the embedding began to train. However, when it tried to generate an image mid-training, the CUDA memory issue occurred again.

It is worth noting that I'm able to use regular prompts as well as the embedding that was terminated early after running out of memory. So this might be helpful in determining what the cause is.

nonetrix · 2023-01-05T06:32:35Z

Same here, as suggested using a less extreme upscale option worked. However, it is considerably slower still. having different highers fix back ends is nice and might yield better results, but why is this the only option? Why not add both?

What is the last known commit that doesn't have this change? I think I'll switch back for that in the time being.

Nilok7 · 2023-01-05T06:47:40Z

The currently Hires. Fix seems to be tuned much more for higher end cards.
It would be very helpful if there was a way to tuned the Hires. Fix to the previous settings, either a direct option or an update to the wiki, for 8GB and lower cards.

DrGunnarMallon · 2023-01-05T08:16:18Z

For now you could always checkout a previous version:

git checkout fd4461d

This is the one I'm using for the time being as I find the system pretty much unusable as it is now.

shimizu-izumi · 2023-01-05T21:07:46Z

Yes, I use xformers. What do you mean by image size limit?

nanafy · 2023-01-05T21:26:00Z

I have the same issue. Found it while using Hi-res fix. I completely understand how to use it, that's not the issue. Now I run out of vram for the same batch sizes/dimensions as before @lolxdmainkaisemaanlu also pointed out the same except they are not even using hi-res. I just happened to notice it on hi-res. Its an independent issue from hi-res fix it seems. reverting fd4461d as well curtousy to @DrGunnarMallon

DoughyInTheMiddle · 2023-01-05T22:29:10Z

For now you could always checkout a previous version:

git checkout fd4461d

This is the one I'm using for the time being as I find the system pretty much unusable as it is now.

I'm running A1111 on a 2060 Super, so 8GB of VRAM.

I had a bit of a workflow to do a couple of 512x512 low-level passes, and then bumped it up to 768 to start getting in detail, finally finishing off and upscaling to 1024. I've been doing passes of this process for almost a week (I've been making daily "Twelve Days of Christmas" images).

Even on my older card, it works. Now, even going from 512 to 768 with just 50 steps it just wrecks. I currently cannot render anything at 768x768.

I tried resetting to the hash recommended above, but I'm still going OOM. Is there another hash to recommend reverting to prior to that?

Error completing request
Arguments: (0, 'a photograph of  a single red apple, on a yellow plate, on a blue checkered tablecloth.', '', 'None', 'None', <PIL.Image.Image image mode=RGBA size=512x512 at 0x1EFB7F20DF0>, None, None, None, None, 0, 50, 0, 4, 0, 1, False, False, 1, 4, 7, 0.2, 1254105237.0, -1.0, 0, 0, 0, False, 768, 768, 0, False, 32, 0, '', '', 0, '<ul>\n<li><code>CFG Scale</code> should be 2 or lower.</li>\n</ul>\n', True, True, '', '', True, 50, True, 1, 0, False, 4, 1, '<p style="margin-bottom:0.75em">Recommended settings: Sampling Steps: 80-100, Sampler: Euler a, Denoising strength: 0.8</p>', 128, 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0, ['left', 'right', 'up', 'down'], False, None, None, '', '', '', '', 'Auto rename', {'label': 'Upload avatars config'}, 'Open outputs directory', 'Export to WebUI style', True, {'label': 'Presets'}, {'label': 'QC preview'}, '', [], 'Select', 'QC scan', 'Show pics', None, False, False, False, False, '', '<p style="margin-bottom:0.75em">Will upscale the image by the selected scale factor; use width and height sliders to set tile size</p>', 64, 0, 2, 'Positive', 0, ', ', True, 32, 1, '', 0, '', True, False, False) {}
Traceback (most recent call last):
  File "G:\GitHub\SDWebUI\modules\call_queue.py", line 45, in f
    res = list(func(*args, **kwargs))
  File "G:\GitHub\SDWebUI\modules\call_queue.py", line 28, in f
    res = func(*args, **kwargs)
  File "G:\GitHub\SDWebUI\modules\img2img.py", line 152, in img2img
    processed = process_images(p)
  File "G:\GitHub\SDWebUI\modules\processing.py", line 471, in process_images
    res = process_images_inner(p)
  File "G:\GitHub\SDWebUI\modules\processing.py", line 541, in process_images_inner
    p.init(p.all_prompts, p.all_seeds, p.all_subseeds)
  File "G:\GitHub\SDWebUI\modules\processing.py", line 887, in init
    self.init_latent = self.sd_model.get_first_stage_encoding(self.sd_model.encode_first_stage(image))
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "G:\GitHub\SDWebUI\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 830, in encode_first_stage
    return self.first_stage_model.encode(x)
  File "G:\GitHub\SDWebUI\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 83, in encode
    h = self.encoder(x)
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "G:\GitHub\SDWebUI\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 526, in forward
    h = self.down[i_level].block[i_block](hs[-1], temb)
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "G:\GitHub\SDWebUI\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 138, in forward
    h = self.norm2(h)
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\nn\modules\normalization.py", line 272, in forward
    return F.group_norm(
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\nn\functional.py", line 2516, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA out of memory. Tried to allocate 1.12 GiB (GPU 0; 8.00 GiB total capacity; 5.29 GiB already allocated; 0 bytes free; 6.53 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

nanafy · 2023-01-05T23:35:44Z

4af3ca5 try that one. the other repo was throwing errors for me as well. Currently back up and running like I was before trying to get the latest build.

DoughyInTheMiddle · 2023-01-06T04:36:11Z

4af3ca5 try that one. the other repo was throwing errors for me as well. Currently back up and running like I was before trying to get the latest build.

That one isn't working for me either. Still going OOM.

After bashing git checkout xxxxxx, is there anything else I need to do other than to close the console and restart?

nanafy · 2023-01-06T14:00:09Z

When you open your auto1111 cmd, it tells you the commit version as soon as you run the webui.bat
Does it say
Commit hash: 4af3ca5
Installing requirements for Web UI...

DoughyInTheMiddle · 2023-01-06T17:12:07Z

I restored back to the master branch and, NVidia just put out a driver update.

One of the two affected things, so at least I'm getting things to work better. Memory usage SEEMS better. Still watching it though for a bit.

nonetrix · 2023-01-06T21:35:02Z

Did you add git pull to your webui script? I've seen a few do that, for me at least reverting back to a old version fixed it for me. Funny because this change made me think xformers was the issue, I guess I'll have to give it another chance I was harsh

DoctorPavel · 2023-01-07T21:04:03Z

I'm not sure how related this is, but I haven't seen anybody else mention it.
Loading a model in the webui, including at launch, has a coinflip's chance of maxing out my 8GB vram instantly and freezing my PC entirely. Has anybody else experienced this issue? This has been a thing since a few pulls now, even before the suspension.
I have been running the webui inside a docker image on Ubuntu 20.04 with rocm and an RX 5700 XT AMD card.

ChinatsuHS · 2023-01-11T23:15:37Z

Having the same issue with just loading the Webui immediately uses and keeps using 5 out of the 8 GB of VRAM
all since the new hires fix was implemented (most common error it OoM's on has to do with resolution scaling (even with hires fix disabled).. am not using SD2.x models at all so those should not be the issue.

with each generation the amount of VRAM in use seems to increase by a few MB ... (which stacks up fast over time) ... img2img is a no go at all as it immediately OoM's

ImBadAtNames2019 · 2023-01-17T06:50:19Z

Same issue here.

RuntimeError: CUDA out of memory. Tried to allocate 76.38 GiB (GPU 0; 12.00 GiB total capacity; 2.57 GiB already allocated; 7.19 GiB free; 2.58 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Time taken: 16.44sTorch active/reserved: 2757/2774 MiB, Sys VRAM: 5051/12288 MiB (41.11%)

Centurion-Rome · 2023-01-21T16:14:12Z

See possible source in "new hires": #6725

mykeehu · 2023-01-23T09:44:26Z

I do not use Hires Fix, but I can no longer change models on Colab because it causes memory overflow:

--lowram, --lowvram and --medvram options no helped. This is the default RAM reservation at startup:

Update: I found a solution:

set VAE to None
under Settings -> Stable Diffusion, set Checkpoints and VAE cache to zero
save the settings and shut down SD (GUI restart is not enough!)
start again.

Regardless, I saw that every time I change the model, it occupies 1 GB more memory, so after a while it causes a memory overflow again.

Mistborn-First-Era · 2023-03-15T11:13:33Z

I have this problem as well. It consists of..

when I open the webui my vram is at 5000ish instead of the normal 500ish. This is idle usage
when I switch models or generate multiple picture in which the model switches via x\y\z my memory usage grows steadily until it maxes out.

LuluViBritannia · 2023-04-10T16:52:11Z

Hey guys, I got a similar issue : I updated the UI, and for some reason the VRAM usage skyrocketted.
It turned out I had to remove the command lines that starts updates at launch. Literally half of my VRAM (3GB out of 6) was taken from the start of the software, and after removing both command lines ("git pull" and the one line to update torch), the VRAM usage became normal again.

So if you just updated the UI and you're now running out of VRAM, remove the command lines for the updates. Hopefully it helps!

Nilok7 · 2023-04-10T17:41:55Z

Hey guys, I got a similar issue : I updated the UI, and for some reason the VRAM usage skyrocketted. It turned out I had to remove the command lines that starts updates at launch. Literally half of my VRAM (3GB out of 6) was taken from the start of the software, and after removing both command lines ("git pull" and the one line to update torch), the VRAM usage became normal again.

So if you just updated the UI and you're now running out of VRAM, remove the command lines for the updates. Hopefully it helps!

Which file did you edit?
I don't have any command lines in the webui-user.bat for that, and there isn't any Git Pull or Torch in the webui.bat

LuluViBritannia · 2023-04-11T08:53:53Z

Hey guys, I got a similar issue : I updated the UI, and for some reason the VRAM usage skyrocketted. It turned out I had to remove the command lines that starts updates at launch. Literally half of my VRAM (3GB out of 6) was taken from the start of the software, and after removing both command lines ("git pull" and the one line to update torch), the VRAM usage became normal again.
So if you just updated the UI and you're now running out of VRAM, remove the command lines for the updates. Hopefully it helps!

Which file did you edit? I don't have any command lines in the webui-user.bat for that, and there isn't any Git Pull or Torch in the webui.bat

The launcher (the webui-user.bat file). I had put two command lines for the updates, thinking it would only affect the launch, but it was actually taking 3GB VRAM for no reason.

In your case that doesn't seem to be the issue. Sorry I can't help ^^'.

catboxanon · 2023-08-13T08:52:54Z

I've made two PRs that I think will finally address this. voldy (auto) has also made recent improvements to the dev branch in 0af4127 and ccb9233 that should improve this as well. Basically, if you miss the performance of hires fix in the early days before ef27a18 changed it, I think this now fixes it. Note you should be using --medvram (or --lowvram), not using --no-half-vae, and using a high-performance optimizer like xformers to take the most advantage of these.

#12514
#12515

I also closed #6725 and #7002 since this issue is the most relevant. The former was just asking for old hires fix to be added back (where width/height is specified manually, which is supported) and the latter is technically a duplicate of this issue.

catboxanon · 2023-08-21T15:41:06Z

Closing this as I've done a few tests and VRAM usage is significantly lower as of the latest dev branch commit. In the scenario given in OP, VRAM peaks just under 6GB, which fits well within their given criteria. Open a new issue with more specifics if problems still occur.

shimizu-izumi added the bug-report Report of a bug, yet to be confirmed label Jan 4, 2023

DenkingOfficial mentioned this issue Jan 21, 2023

[Bug]: Memory leak when using Hires Fix #7002

Closed

1 task

Centurion-Rome mentioned this issue Jan 21, 2023

[Feature Request]: Bring old Hires.Fix back #6725

Closed

1 task

catboxanon closed this as completed Aug 21, 2023

[Bug]: VRAM usage is way higher #6307

[Bug]: VRAM usage is way higher #6307

Comments

shimizu-izumi commented Jan 4, 2023 • edited Loading

Is there an existing issue for this?

What happened?

Steps to reproduce the problem

What should have happened?

Commit where the problem happens

What platforms do you use to access UI ?

What browsers do you use to access the UI ?

Command Line Arguments

Additional information, context and logs

ClashSAN commented Jan 4, 2023

walkerakiz commented Jan 4, 2023

shimizu-izumi commented Jan 4, 2023 • edited Loading

mxzgithub commented Jan 4, 2023 • edited Loading

Alphyn-gunner commented Jan 4, 2023 • edited Loading

Alphyn-gunner commented Jan 4, 2023

shimizu-izumi commented Jan 4, 2023 • edited Loading

shimizu-izumi commented Jan 4, 2023

ClashSAN commented Jan 4, 2023

lolxdmainkaisemaanlu commented Jan 4, 2023

Campfirecrucifix commented Jan 5, 2023 • edited Loading

mykeehu commented Jan 5, 2023

GarbageHaus commented Jan 5, 2023 • edited Loading

nonetrix commented Jan 5, 2023 • edited Loading

Nilok7 commented Jan 5, 2023

DrGunnarMallon commented Jan 5, 2023 • edited Loading

shimizu-izumi commented Jan 5, 2023

nanafy commented Jan 5, 2023 • edited Loading

DoughyInTheMiddle commented Jan 5, 2023

nanafy commented Jan 5, 2023 • edited Loading

DoughyInTheMiddle commented Jan 6, 2023

nanafy commented Jan 6, 2023

DoughyInTheMiddle commented Jan 6, 2023

nonetrix commented Jan 6, 2023

DoctorPavel commented Jan 7, 2023

ChinatsuHS commented Jan 11, 2023

ImBadAtNames2019 commented Jan 17, 2023

Centurion-Rome commented Jan 21, 2023

mykeehu commented Jan 23, 2023 • edited Loading

Mistborn-First-Era commented Mar 15, 2023

LuluViBritannia commented Apr 10, 2023

Nilok7 commented Apr 10, 2023 • edited Loading

LuluViBritannia commented Apr 11, 2023

catboxanon commented Aug 13, 2023 • edited Loading

catboxanon commented Aug 21, 2023

shimizu-izumi commented Jan 4, 2023 •

edited

Loading

shimizu-izumi commented Jan 4, 2023 •

edited

Loading

mxzgithub commented Jan 4, 2023 •

edited

Loading

Alphyn-gunner commented Jan 4, 2023 •

edited

Loading

shimizu-izumi commented Jan 4, 2023 •

edited

Loading

Campfirecrucifix commented Jan 5, 2023 •

edited

Loading

GarbageHaus commented Jan 5, 2023 •

edited

Loading

nonetrix commented Jan 5, 2023 •

edited

Loading

DrGunnarMallon commented Jan 5, 2023 •

edited

Loading

nanafy commented Jan 5, 2023 •

edited

Loading

nanafy commented Jan 5, 2023 •

edited

Loading

mykeehu commented Jan 23, 2023 •

edited

Loading

Nilok7 commented Apr 10, 2023 •

edited

Loading

catboxanon commented Aug 13, 2023 •

edited

Loading