Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not run xformers::efficient_attention_forward_cutlass #2073

Closed
wankio opened this issue Oct 9, 2022 · 11 comments
Closed

Could not run xformers::efficient_attention_forward_cutlass #2073

wankio opened this issue Oct 9, 2022 · 11 comments
Labels
asking-for-help-with-local-system-issues This issue is asking for help related to local system; please offer assistance

Comments

@wankio
Copy link

wankio commented Oct 9, 2022

venv "C:\Users\GEN32UC\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)]
Commit hash: cbf6dad02d04d98e5a2d5e870777ab99b5796b2d
Installing requirements for Web UI
Launching Web UI with arguments: --listen --always-batch-cond-uncond --precision full --no-half --opt-split-attention --force-enable-xformers
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Loading weights [7460a6fa] from C:\Users\GEN32UC\stable-diffusion-webui\models\Stable-diffusion\model.ckpt
Global Step: 470000
Applying xformers cross attention optimization.
Model loaded.
Loading hypernetwork None
Loaded a total of 6 textual inversion embeddings.
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
  0%|                                                                                           | 0/20 [00:01<?, ?it/s]
Error completing request
Arguments: ('cat', '', 'None', 'None', 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, False, 0.7, 0, False, False, None, '', 1, '', 4, '', True, False) {}
Traceback (most recent call last):
  File "C:\Users\GEN32UC\stable-diffusion-webui\modules\ui.py", line 176, in f
    res = list(func(*args, **kwargs))
  File "C:\Users\GEN32UC\stable-diffusion-webui\webui.py", line 68, in f
    res = func(*args, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\modules\txt2img.py", line 43, in txt2img
    processed = process_images(p)
  File "C:\Users\GEN32UC\stable-diffusion-webui\modules\processing.py", line 391, in process_images
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength)
  File "C:\Users\GEN32UC\stable-diffusion-webui\modules\processing.py", line 518, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning)
  File "C:\Users\GEN32UC\stable-diffusion-webui\modules\sd_samplers.py", line 399, in sample
    samples = self.func(self.model_wrap_cfg, x, extra_args={'cond': conditioning, 'uncond': unconditional_conditioning, 'cond_scale': p.cfg_scale}, disable=False, callback=self.callback_state, **extra_params_kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 80, in sample_euler_ancestral
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\modules\sd_samplers.py", line 239, in forward
    x_out = self.inner_model(x_in, sigma_in, cond=cond_in)
  File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\models\diffusion\ddpm.py", line 987, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\models\diffusion\ddpm.py", line 1410, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\diffusionmodules\openaimodel.py", line 732, in forward
    h = module(h, emb, context)
  File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\diffusionmodules\openaimodel.py", line 85, in forward
    x = layer(x, context)
  File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\attention.py", line 258, in forward
    x = block(x, context=context)
  File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\attention.py", line 209, in forward
    return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\diffusionmodules\util.py", line 114, in checkpoint
    return CheckpointFunction.apply(func, len(inputs), *args)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\diffusionmodules\util.py", line 127, in forward
    output_tensors = ctx.run_function(*ctx.input_tensors)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\attention.py", line 212, in _forward
    x = self.attn1(self.norm1(x)) + x
  File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\modules\sd_hijack_optimizations.py", line 145, in xformers_attention_forward
    out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None)
  File "c:\users\gen32uc\stable-diffusion-webui\repositories\xformers\xformers\ops.py", line 862, in memory_efficient_attention
    return op.forward_no_grad(
  File "c:\users\gen32uc\stable-diffusion-webui\repositories\xformers\xformers\ops.py", line 305, in forward_no_grad
    return cls.FORWARD_OPERATOR(
  File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\_ops.py", line 143, in __call__
    return self._op(*args, **kwargs or {})
NotImplementedError: Could not run 'xformers::efficient_attention_forward_cutlass' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'xformers::efficient_attention_forward_cutlass' is only available for these backends: [UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, NestedTensorCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].

BackendSelect: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\PythonFallbackKernel.cpp:133 [backend fallback]
Named: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\ZeroTensorFallback.cpp:86 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at C:\Users\circleci\project\functorch\csrc\DynamicLayer.cpp:487 [backend fallback]
ADInplaceOrView: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:64 [backend fallback]
AutogradOther: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:35 [backend fallback]
AutogradCPU: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:39 [backend fallback]
AutogradCUDA: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:47 [backend fallback]
AutogradXLA: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:51 [backend fallback]
AutogradMPS: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:59 [backend fallback]
AutogradXPU: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:43 [backend fallback]
AutogradHPU: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:68 [backend fallback]
AutogradLazy: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:55 [backend fallback]
Tracer: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\autograd\TraceTypeManual.cpp:295 [backend fallback]
AutocastCPU: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\autocast_mode.cpp:481 [backend fallback]
Autocast: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\autocast_mode.cpp:324 [backend fallback]
FuncTorchBatched: registered at C:\Users\circleci\project\functorch\csrc\LegacyBatchingRegistrations.cpp:661 [backend fallback]
FuncTorchVmapMode: fallthrough registered at C:\Users\circleci\project\functorch\csrc\VmapModeRegistrations.cpp:24 [backend fallback]
Batched: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\BatchingRegistrations.cpp:1064 [backend fallback]
VmapMode: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at C:\Users\circleci\project\functorch\csrc\TensorWrapper.cpp:187 [backend fallback]
Functionalize: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\FunctionalizeFallbackKernel.cpp:89 [backend fallback]
PythonTLSSnapshot: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\PythonFallbackKernel.cpp:137 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at C:\Users\circleci\project\functorch\csrc\DynamicLayer.cpp:483 [backend fallback]

today i decided to try xformer, after many failed install, after all, it successful installed. When i press generate, it just have above error
CUDA lastest, before xformers installed and run with command, everything just work normal.

@wankio wankio added the bug-report Report of a bug, yet to be confirmed label Oct 9, 2022
@Thomas-MMJ
Copy link

Thomas-MMJ commented Oct 9, 2022

do you have Cutlass installed?

conda install cutlass
or

pip install cutlass

either you can try and install cutlass,

or you can uninstall xformers

pip uninstall xformers

@wankio
Copy link
Author

wankio commented Oct 10, 2022

well i just delete xformers folder and recomplie, with torch_cuda_arch_list, it worked now
i think keep install it on exist folder(even it dont have anything inside caused the problem)

@luckyycode
Copy link

luckyycode commented Oct 11, 2022

If anyone is having trouble with that in Docker, that helped me: (change TORCH_CUDA_ARCH_LIST to your value). 8.6 is for RTX 3060

RUN git clone https://github.com/facebookresearch/xformers/ repositories/xformers && cd repositories/xformers && git submodule update --init --recursive

RUN apt install -y g++
RUN cd repositories/xformers && \
    export FORCE_CUDA="1" && \
    export TORCH_CUDA_ARCH_LIST=8.6 && \
    CUDA_VISIBLE_DEVICES=0 pip install --verbose --no-deps -e .

@kkimmm
Copy link

kkimmm commented Dec 13, 2022

If anyone is having trouble with that in Docker, that helped me: (change TORCH_CUDA_ARCH_LIST to your value). 8.6 is for RTX 3060

RUN git clone https://github.com/facebookresearch/xformers/ repositories/xformers && cd repositories/xformers && git submodule update --init --recursive

RUN apt install -y g++
RUN cd repositories/xformers && \
    export FORCE_CUDA="1" && \
    export TORCH_CUDA_ARCH_LIST=8.6 && \
    CUDA_VISIBLE_DEVICES=0 pip install --verbose --no-deps -e .

i have 3090ti, 20.04.1-Ubuntu, and run:
export FORCE_CUDA="1" && export TORCH_CUDA_ARCH_LIST=11.6 && CUDA_VISIBLE_DEVICES=0 pip install --verbose --no-deps -e .
but i got an error:

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:///home/kai/my_download/stable-diffusion-webui/repositories/xformers
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [10 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/home/kai/my_download/stable-diffusion-webui/repositories/xformers/setup.py", line 304, in <module>
          ext_modules=get_extensions(),
        File "/home/kai/my_download/stable-diffusion-webui/repositories/xformers/setup.py", line 251, in get_extensions
          ext_modules += get_flash_attention_extensions(
        File "/home/kai/my_download/stable-diffusion-webui/repositories/xformers/setup.py", line 117, in get_flash_attention_extensions
          num = 10 * int(arch[0]) + int(arch[2])
      ValueError: invalid literal for int() with base 10: '.'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

@jpollard-cs
Copy link

If anyone is having trouble with that in Docker, that helped me: (change TORCH_CUDA_ARCH_LIST to your value). 8.6 is for RTX 3060

RUN git clone https://github.com/facebookresearch/xformers/ repositories/xformers && cd repositories/xformers && git submodule update --init --recursive

RUN apt install -y g++
RUN cd repositories/xformers && \
    export FORCE_CUDA="1" && \
    export TORCH_CUDA_ARCH_LIST=8.6 && \
    CUDA_VISIBLE_DEVICES=0 pip install --verbose --no-deps -e .

If you're on a local Ubuntu or Ubuntu Desktop instance please see this issue instead first: #4942. I will add details there of some cleanup I had to do after attempting the fix from this PR. cc @kkimmm

@jpollard-cs
Copy link

Also I'm unsure why you'd want to set CUDA_VISIBLE_DEVICES to 0 unless you don't have any NVIDIA GPUs (which you indicated you had an RTX 3060). If I understand correctly this would result in a build that does not leverage your GPU.

@Thomas-MMJ
Copy link

Also I'm unsure why you'd want to set CUDA_VISIBLE_DEVICES to 0 unless you don't have any NVIDIA GPUs (which you indicated you had an RTX 3060). If I understand correctly this would result in a build that does not leverage your GPU.

CUDA_VISIBLE_DEVICES is a list of CUDA DEVICE ID slots,

https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/

they are number 0, 1, 2 etc.

@jpollard-cs
Copy link

Also I'm unsure why you'd want to set CUDA_VISIBLE_DEVICES to 0 unless you don't have any NVIDIA GPUs (which you indicated you had an RTX 3060). If I understand correctly this would result in a build that does not leverage your GPU.

CUDA_VISIBLE_DEVICES is a list of CUDA DEVICE ID slots,

https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/

they are number 0, 1, 2 etc.

Ah okay got it. Looks like I read some misguided information on this. Thanks for the clarification @Thomas-MMJ

@chris-aeviator
Copy link

chris-aeviator commented Jan 6, 2023

@kkimmm

If anyone is having trouble with that in Docker, that helped me: (change TORCH_CUDA_ARCH_LIST to your value). 8.6 is for RTX 3060

RUN git clone https://github.com/facebookresearch/xformers/ repositories/xformers && cd repositories/xformers && git submodule update --init --recursive

RUN apt install -y g++
RUN cd repositories/xformers && \
    export FORCE_CUDA="1" && \
    export TORCH_CUDA_ARCH_LIST=8.6 && \
    CUDA_VISIBLE_DEVICES=0 pip install --verbose --no-deps -e .

i have 3090ti, 20.04.1-Ubuntu, and run: export FORCE_CUDA="1" && export TORCH_CUDA_ARCH_LIST=11.6 && CUDA_VISIBLE_DEVICES=0 pip install --verbose --no-deps -e . but i got an error:

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:///home/kai/my_download/stable-diffusion-webui/repositories/xformers
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [10 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/home/kai/my_download/stable-diffusion-webui/repositories/xformers/setup.py", line 304, in <module>
          ext_modules=get_extensions(),
        File "/home/kai/my_download/stable-diffusion-webui/repositories/xformers/setup.py", line 251, in get_extensions
          ext_modules += get_flash_attention_extensions(
        File "/home/kai/my_download/stable-diffusion-webui/repositories/xformers/setup.py", line 117, in get_flash_attention_extensions
          num = 10 * int(arch[0]) + int(arch[2])
      ValueError: invalid literal for int() with base 10: '.'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

I believe arch list is not meant to be your cuda version - refer to https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#gpu-feature-list

@mezotaken mezotaken added asking-for-help-with-local-system-issues This issue is asking for help related to local system; please offer assistance and removed bug-report Report of a bug, yet to be confirmed labels Jan 16, 2023
@kopyl
Copy link
Contributor

kopyl commented Apr 1, 2023

So how to fix it?

@catboxanon
Copy link
Collaborator

Closing as stale.

nne998 pushed a commit to fjteam/stable-diffusion-webui that referenced this issue Sep 26, 2023
* fix tiled vae

* fix tiled vae
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asking-for-help-with-local-system-issues This issue is asking for help related to local system; please offer assistance
Projects
None yet
Development

No branches or pull requests

9 participants