Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llava-llama is huge. OOM. #114

Open
darkon12 opened this issue Dec 11, 2024 · 8 comments
Open

Llava-llama is huge. OOM. #114

darkon12 opened this issue Dec 11, 2024 · 8 comments

Comments

@darkon12
Copy link

Is there a chance to use something smaller ?

@kijai
Copy link
Owner

kijai commented Dec 11, 2024

If you can install bitsandbytes, you can use the bf4 quantization option making it 4 times smaller.

@darkon12
Copy link
Author

Installed bitsandbytes with:
pip install bitsandbytes
Still OOM.
I guess this thing is beyond google colab 12G ram/15G vram.

@Ratinod
Copy link

Ratinod commented Dec 11, 2024

Installed bitsandbytes with
python_embeded\python.exe -m pip install bitsandbytes
and get

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████| 4/4 [00:08<00:00,  2.22s/it]
!!! Exception during processing !!! `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.
Traceback (most recent call last):
  File "M:\_SD\ComfyUI_windows_portable_nvidia\ComfyUI\execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "M:\_SD\ComfyUI_windows_portable_nvidia\ComfyUI\execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "M:\_SD\ComfyUI_windows_portable_nvidia\ComfyUI\execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "M:\_SD\ComfyUI_windows_portable_nvidia\ComfyUI\execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "M:\_SD\ComfyUI_windows_portable_nvidia\ComfyUI\custom_nodes\ComfyUI-HunyuanVideoWrapper\nodes.py", line 477, in loadmodel
    text_encoder = TextEncoder(
                   ^^^^^^^^^^^^
  File "M:\_SD\ComfyUI_windows_portable_nvidia\ComfyUI\custom_nodes\ComfyUI-HunyuanVideoWrapper\hyvideo\text_encoder\__init__.py", line 156, in __init__
    self.model, self.model_path = load_text_encoder(
                                  ^^^^^^^^^^^^^^^^^^
  File "M:\_SD\ComfyUI_windows_portable_nvidia\ComfyUI\custom_nodes\ComfyUI-HunyuanVideoWrapper\hyvideo\text_encoder\__init__.py", line 38, in load_text_encoder
    text_encoder = AutoModel.from_pretrained(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "M:\_SD\ComfyUI_windows_portable_nvidia\python_embeded\Lib\site-packages\transformers\models\auto\auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "M:\_SD\ComfyUI_windows_portable_nvidia\python_embeded\Lib\site-packages\transformers\modeling_utils.py", line 4034, in from_pretrained
    dispatch_model(model, **device_map_kwargs)
  File "M:\_SD\ComfyUI_windows_portable_nvidia\python_embeded\Lib\site-packages\accelerate\big_modeling.py", line 498, in dispatch_model
    model.to(device)
  File "M:\_SD\ComfyUI_windows_portable_nvidia\python_embeded\Lib\site-packages\transformers\modeling_utils.py", line 2883, in to
    raise ValueError(
ValueError: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

bitsandbytes-0.45.0-py3-none-win_amd64.whl

M:\_SD\ComfyUI_windows_portable_nvidia>python_embeded\python.exe -m pip show bitsandbytes
Name: bitsandbytes
Version: 0.45.0
Summary: k-bit optimizers and matrix multiplication routines.
Home-page: https://github.com/bitsandbytes-foundation/bitsandbytes
Author: Tim Dettmers
Author-email: dettmers@cs.washington.edu
License: MIT
Location: M:\_SD\ComfyUI_windows_portable_nvidia\python_embeded\Lib\site-packages
Requires: numpy, torch, typing_extensions
Required-by:

Any advice?

@Ratinod
Copy link

Ratinod commented Dec 11, 2024

M:\_SD\ComfyUI_windows_portable_nvidia>python_embeded\python.exe -m bitsandbytes
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
CUDA specs: CUDASpecs(highest_compute_capability=(8, 9), cuda_version_string='124', cuda_version_tuple=(12, 4))
PyTorch settings found: CUDA_VERSION=124, Highest Compute Capability: (8, 9).
To manually override the PyTorch CUDA version please see: https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx
CUDA SETUP: WARNING! CUDA runtime files not found in any environmental path.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and CUDA is callable...
SUCCESS!
Installation was successful!

"CUDA SETUP: WARNING! CUDA runtime files not found in any environmental path." The problem is this? How to fix it?

@Ratinod
Copy link

Ratinod commented Dec 11, 2024

I found how to make bf4 quantization work! (windows)

python_embeded\python.exe -m pip install bitsandbytes
python_embeded\python.exe -m pip uninstall accelerate
python_embeded\python.exe -m pip install accelerate==1.1.1

accelerate 1.2.0 gives the error above

correct fix:

python_embeded\python.exe -m pip install accelerate==1.2.0
python_embeded\python.exe -m pip install transformers==4.47.0

Now all that's left is to wait for support HunyuanVideo in https://github.com/KONAKONA666/q8_kernels for speed and it would be great.

@4lt3r3go
Copy link

4lt3r3go commented Dec 11, 2024

Now all that's left is to wait for support HunyuanVideo q8_kernels for speed

saw there's a LTX Q8 version, do you have any idea if is already usable in comfy? is so confusing.
and YEAH all video users are waiting some Hunyuan Quants accellerated pop out one day or the other.
hopefully like tomorrow 🤞

@Ratinod
Copy link

Ratinod commented Dec 11, 2024

saw there's a LTX 18 version, do you have any idea if is already usable in comfy?

I managed to install (windows) it (https://github.com/KONAKONA666/q8_kernels) with (https://github.com/KONAKONA666/LTX-Video) in a separate venv (not comfyUI). The speed really increased more than 2 times. But the lack of support STG and the need for constant (long time) loading/unloading of the necessary models into memory negate the speed advantage.
At the moment I haven't found a ComfyUI node that supports Q8 LTX-Video. let's finish here. This topic was not about LTX after all...

@4lt3r3go
Copy link

I managed to install (windows) it (https://github.com/KONAKONA666/q8_kernels) with (https://github.com/KONAKONA666/LTX-Video) in a separate venv (not comfyUI). The speed really increased more than 2 times. But the lack of support STG and the need for constant (long time) loading/unloading of the necessary models into memory negate the speed advantage. At the moment I haven't found a ComfyUI node that supports Q8 LTX-Video. let's finish here. This topic was not about LTX after all...

thanks! yeah back to main topic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants