Skip to content

Support for NVIDIA Blackwell Architecture (RTX 50-series, sm_120) - Capability (12, 0) too new #10

@Pr3zLy

Description

@Pr3zLy

I am trying to run inference using diffusers with enable_xformers_memory_efficient_attention() on a Windows machine equipped with the new NVIDIA RTX 5070 Ti (Blackwell architecture).

The execution crashes with a NotImplementedError because xformers does not yet recognize or support Compute Capability 12.0 (sm_120).

Environment
OS: Windows 10/11

GPU: NVIDIA GeForce RTX 5070 Ti

Python: 3.10

Error Log
Plaintext

NotImplementedError: No operator found for memory_efficient_attention_forward with inputs:
query : shape=(1, 2, 1, 40) (torch.float32)
key : shape=(1, 2, 1, 40) (torch.float32)
value : shape=(1, 2, 1, 40) (torch.float32)
attn_bias : <class 'NoneType'>
p : 0.0
fa3F@2.8.3-133-gde1584b is not supported because:
requires device with capability <= (9, 0) but your GPU has capability (12, 0) (too new)
dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
requires device with capability == (8, 0) but your GPU has capability (12, 0) (too new)
cutlassF-pt is not supported because:
requires device with capability <= (9, 0) but your GPU has capability (12, 0) (too new)
Additional Context
Since PyTorch Stable (2.1.x - 2.5.x) does not support the RTX 50-series yet, users with this hardware must use PyTorch Nightly. It seems xformers kernels need to be updated to allow execution on sm_120 devices, or at least fallback gracefully.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions