Llava-llama is huge. OOM. #114

darkon12 · 2024-12-11T14:51:35Z

Is there a chance to use something smaller ?

kijai · 2024-12-11T14:54:09Z

If you can install bitsandbytes, you can use the bf4 quantization option making it 4 times smaller.

darkon12 · 2024-12-11T18:43:58Z

Installed bitsandbytes with:
pip install bitsandbytes
Still OOM.
I guess this thing is beyond google colab 12G ram/15G vram.

Ratinod · 2024-12-11T19:01:43Z

Installed bitsandbytes with
python_embeded\python.exe -m pip install bitsandbytes
and get

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████| 4/4 [00:08<00:00,  2.22s/it]
!!! Exception during processing !!! `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.
Traceback (most recent call last):
  File "M:\_SD\ComfyUI_windows_portable_nvidia\ComfyUI\execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "M:\_SD\ComfyUI_windows_portable_nvidia\ComfyUI\execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "M:\_SD\ComfyUI_windows_portable_nvidia\ComfyUI\execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "M:\_SD\ComfyUI_windows_portable_nvidia\ComfyUI\execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "M:\_SD\ComfyUI_windows_portable_nvidia\ComfyUI\custom_nodes\ComfyUI-HunyuanVideoWrapper\nodes.py", line 477, in loadmodel
    text_encoder = TextEncoder(
                   ^^^^^^^^^^^^
  File "M:\_SD\ComfyUI_windows_portable_nvidia\ComfyUI\custom_nodes\ComfyUI-HunyuanVideoWrapper\hyvideo\text_encoder\__init__.py", line 156, in __init__
    self.model, self.model_path = load_text_encoder(
                                  ^^^^^^^^^^^^^^^^^^
  File "M:\_SD\ComfyUI_windows_portable_nvidia\ComfyUI\custom_nodes\ComfyUI-HunyuanVideoWrapper\hyvideo\text_encoder\__init__.py", line 38, in load_text_encoder
    text_encoder = AutoModel.from_pretrained(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "M:\_SD\ComfyUI_windows_portable_nvidia\python_embeded\Lib\site-packages\transformers\models\auto\auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "M:\_SD\ComfyUI_windows_portable_nvidia\python_embeded\Lib\site-packages\transformers\modeling_utils.py", line 4034, in from_pretrained
    dispatch_model(model, **device_map_kwargs)
  File "M:\_SD\ComfyUI_windows_portable_nvidia\python_embeded\Lib\site-packages\accelerate\big_modeling.py", line 498, in dispatch_model
    model.to(device)
  File "M:\_SD\ComfyUI_windows_portable_nvidia\python_embeded\Lib\site-packages\transformers\modeling_utils.py", line 2883, in to
    raise ValueError(
ValueError: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

bitsandbytes-0.45.0-py3-none-win_amd64.whl

M:\_SD\ComfyUI_windows_portable_nvidia>python_embeded\python.exe -m pip show bitsandbytes
Name: bitsandbytes
Version: 0.45.0
Summary: k-bit optimizers and matrix multiplication routines.
Home-page: https://github.com/bitsandbytes-foundation/bitsandbytes
Author: Tim Dettmers
Author-email: dettmers@cs.washington.edu
License: MIT
Location: M:\_SD\ComfyUI_windows_portable_nvidia\python_embeded\Lib\site-packages
Requires: numpy, torch, typing_extensions
Required-by:

Any advice?

Ratinod · 2024-12-11T19:57:14Z

M:\_SD\ComfyUI_windows_portable_nvidia>python_embeded\python.exe -m bitsandbytes
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
CUDA specs: CUDASpecs(highest_compute_capability=(8, 9), cuda_version_string='124', cuda_version_tuple=(12, 4))
PyTorch settings found: CUDA_VERSION=124, Highest Compute Capability: (8, 9).
To manually override the PyTorch CUDA version please see: https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx
CUDA SETUP: WARNING! CUDA runtime files not found in any environmental path.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and CUDA is callable...
SUCCESS!
Installation was successful!

~~"CUDA SETUP: WARNING! CUDA runtime files not found in any environmental path." The problem is this? How to fix it?~~

Ratinod · 2024-12-11T20:29:43Z

~~I found how to make bf4 quantization work! (windows)~~

python_embeded\python.exe -m pip install bitsandbytes
python_embeded\python.exe -m pip uninstall accelerate
python_embeded\python.exe -m pip install accelerate==1.1.1

~~accelerate 1.2.0 gives the error above~~

correct fix:

python_embeded\python.exe -m pip install accelerate==1.2.0
python_embeded\python.exe -m pip install transformers==4.47.0

Now all that's left is to wait for support HunyuanVideo in https://github.com/KONAKONA666/q8_kernels for speed and it would be great.

4lt3r3go · 2024-12-11T21:18:15Z

Now all that's left is to wait for support HunyuanVideo q8_kernels for speed

saw there's a LTX Q8 version, do you have any idea if is already usable in comfy? is so confusing.
and YEAH all video users are waiting some Hunyuan Quants accellerated pop out one day or the other.
hopefully like tomorrow 🤞

Ratinod · 2024-12-11T22:03:27Z

saw there's a LTX 18 version, do you have any idea if is already usable in comfy?

I managed to install (windows) it (https://github.com/KONAKONA666/q8_kernels) with (https://github.com/KONAKONA666/LTX-Video) in a separate venv (not comfyUI). The speed really increased more than 2 times. But the lack of support STG and the need for constant (long time) loading/unloading of the necessary models into memory negate the speed advantage.
At the moment I haven't found a ComfyUI node that supports Q8 LTX-Video. let's finish here. This topic was not about LTX after all...

4lt3r3go · 2024-12-12T08:49:04Z

I managed to install (windows) it (https://github.com/KONAKONA666/q8_kernels) with (https://github.com/KONAKONA666/LTX-Video) in a separate venv (not comfyUI). The speed really increased more than 2 times. But the lack of support STG and the need for constant (long time) loading/unloading of the necessary models into memory negate the speed advantage. At the moment I haven't found a ComfyUI node that supports Q8 LTX-Video. let's finish here. This topic was not about LTX after all...

thanks! yeah back to main topic

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llava-llama is huge. OOM. #114

Llava-llama is huge. OOM. #114

darkon12 commented Dec 11, 2024

kijai commented Dec 11, 2024

darkon12 commented Dec 11, 2024

Ratinod commented Dec 11, 2024 •

edited

Loading

Ratinod commented Dec 11, 2024 •

edited

Loading

Ratinod commented Dec 11, 2024 •

edited

Loading

4lt3r3go commented Dec 11, 2024 •

edited

Loading

Ratinod commented Dec 11, 2024 •

edited

Loading

4lt3r3go commented Dec 12, 2024

Llava-llama is huge. OOM. #114

Llava-llama is huge. OOM. #114

Comments

darkon12 commented Dec 11, 2024

kijai commented Dec 11, 2024

darkon12 commented Dec 11, 2024

Ratinod commented Dec 11, 2024 • edited Loading

Ratinod commented Dec 11, 2024 • edited Loading

Ratinod commented Dec 11, 2024 • edited Loading

4lt3r3go commented Dec 11, 2024 • edited Loading

Ratinod commented Dec 11, 2024 • edited Loading

4lt3r3go commented Dec 12, 2024

Ratinod commented Dec 11, 2024 •

edited

Loading

Ratinod commented Dec 11, 2024 •

edited

Loading

Ratinod commented Dec 11, 2024 •

edited

Loading

4lt3r3go commented Dec 11, 2024 •

edited

Loading

Ratinod commented Dec 11, 2024 •

edited

Loading