-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Add multi-gpu support #5997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add multi-gpu support #5997
Conversation
Lincoln, please stop tempting me to buy another RTX 4090. |
Should be waiting on the resume event instead of checking it in a loop
Prefer an early return/continue to reduce the indentation of the processor loop. Easier to read. There are other ways to improve its structure but at first glance, they seem to involve changing the logic in scarier ways.
@lstein You're my hero! Can you hide it behind a checkbox, a setting or the env variable? Just to merge this feature and prevent @psychedelicious from worrying too much. |
@makemefeelgr8 Sorry but it's not that simple. This change needs to wait until we can allocate resources to do thorough testing. |
b6e026b
to
589a795
Compare
Wheel of commit 589a795 InvokeAI-4.2.3-py3-none-any.whl.zip |
This is awesome, I've been trying to find an AI interface, which has multiple GPU support. I have 2 3070s, and I can only use 1 at a time. Would like to see this implemented in the future on Invoke |
I built invoke from I have observed lots of these warnings:
The following is the final error:
I have two P40s with 24GB vram each. My server has 250GiB of ram with lots of free space. All models and switching work properly on the main branch. EDIT after further testing:To test the issue I switched the model between every invoke. I did not encounter the above error when I only queued up single invokes. Also the GPU devices are not released properly if the above error occurs |
When the I have also just encountered the following error:
It happened when I queued up many single invokes on different models with 1024x1024 on sd-1 models. The above error only rarely happens most of the time it works. Something with the concurrent access to the upscaling models might not be 100% thread safe. |
@raldone01 Thank you so much for giving the PR a try and your valuable feedback. I think I know where the meta tensor bug is occurring and should have a fix soon. |
@raldone01 I've fixed what I believe to be the bug with changing models. Unfortunately I don't have access to a multi-GPU system at the moment, and have only tested it in a single-GPU environments. Give it a whirl and let me know how it goes. |
I tested your branch again. Also new errors appeared:
Full Logs
|
Thanks very much. I've got access to a multi-GPU system now and will be able to test more thoroughly. I'll let you know when there's a new version to look at. |
- temporarily disable vram cache
c5eec1d
to
2219e36
Compare
@raldone01 I just committed a series of changes that should make the multi-GPU support more stable. If you have a chance to check it out, let me know how it works in your hands. |
Awesome I just built the image. I will try to get some proper testing in, next week. |
@lstein Do you know what the default unload timer is? After queuing up a few images the GPUs are not fully released which causes them to consume 50W instead of 10W per GPU. I will try to just add one GPU to the container to see if it still occurs. No python errors so far. 👍 |
This PR is no longer being maintained. For InvokeAI Multi-GPU support, please see https://github.com/lstein/InvokeAI-MGPU |
super interested in this. dual 4060 tis ready to test. |
Summary
This adds support for systems that have multiple GPUs. On CUDA systems, it will automatically detect when a system has more than one GPU and configure the model cache and the session processor to take advantage of them, keeping track of which GPUs are busy and which are available, and rendering batches of images in parallel. It works at the session processor level by placing each session into a thread-safe queue that is monitored by multiple threads. Each thread reserves a GPU at entry, processes the entire invocation, and then releases the GPU to be used by other pending requests.
This PR is no longer being maintained. Multi-GPU support can be found in a forked repository at: https://github.com/lstein/InvokeAI-MGPU
Demo
cinnamon-2024-04-16T152651-0400.webm
How it works
In addition to changes in the session processor, this PR adds a few calls to the model manager's RAM cache to reserve and release GPUs in a thread-safe way, and extends the TorchDevice class to support dynamic device selection without changing its API. The PR also improves how models are moved from RAM to VRAM to increase load speed modestly. During debugging, I discovered that
uuid.uuid4()
does not appear to be thread-safe on Windows platforms (https://stackoverflow.com/questions/2759644/python-multiprocessing-doesnt-play-nicely-with-uuid-uuid4), and this was borking the latent caching system. I worked around this by adding the current thread ID to the cache object's name.There are two new options for the config file:
max_threads
-- specify the maximum number of session processing threads that can run at the same time. If not defined, will set this equal to the number of GPU devices.devices
-- a list of devices to use for acceleration. If not defined, this will be dynamically calculated to use all CUDA GPUs found.Example:
Note that there is no problem if
max_threads
does not match the number of GPU devices (even on single-GPU systems), but there won't be any benefit to defining more threads than GPUs.The code is currently tested and working using multiple threads on a 6-GPU Windows machine.
To test
First, buy yourself two RTX 4090s :-).
Seriously, though, the best thing to do is to do ensure that this doesn't crash single-GPU systems. Exercise the linear and graph workflows. Try different models, loras, IP adapters, upscalers, etc. Run a couple large batches and make sure that they can be paused, resumed and cancelled as usual.
If you have access to a system that has an integrated GPU as well as a discrete one, you can test out the multi-GPU processing simply by queueing up a series of 2 or more generation jobs.
QA Instructions
Squash merge when approved.
Merge Plan
Checklist