Description
GLiNER inference fails with CUBLAS_STATUS_NOT_INITIALIZED on NVIDIA Blackwell architecture GPUs (compute capability 12.0) when using PyTorch 2.10.0+cu130.
The error occurs in DeBERTa v2's F.linear call during the encoder forward pass — specifically in cublasLtMatmulAlgoGetHeuristic. Model loading succeeds; only inference triggers the error.
Environment
- GPU: NVIDIA RTX PRO 6000 Blackwell Max-Q (SM 12.0, compute capability 12.0)
- Driver: 595.45.04
- PyTorch: 2.10.0+cu130 (from
https://download.pytorch.org/whl/cu130)
- GLiNER: 0.2.26
- transformers: 4.57.6 (also reproduced with 5.1.0)
- CUDA container: nvidia/cuda:13.2.0-cudnn-runtime-ubuntu24.04
- OS: Ubuntu 24.04
Reproduction
import torch
from gliner import GLiNER
model = GLiNER.from_pretrained("urchade/gliner_medium-v2.1")
model = model.to("cuda").eval()
# This fails:
with torch.no_grad():
entities = model.predict_entities(
"Apple CEO Tim Cook announced new products in Cupertino.",
["person", "organization", "location"],
threshold=0.5
)
Error
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling
`cublasLtMatmulAlgoGetHeuristic( ltHandle, computeDesc.descriptor(),
Adesc.descriptor(), Bdesc.descriptor(), Cdesc.descriptor(), Cdesc.descriptor(),
preference.descriptor(), 1, &heuristicResult, &returnedResult)`
Full traceback points to:
transformers/models/deberta_v2/modeling_deberta_v2.py → DisentangledSelfAttention.forward
→ self.query_proj(query_states)
→ F.linear(input, self.weight, self.bias)
Key findings
- Basic CUDA matmul works —
torch.randn(10,10,device='cuda') @ torch.randn(10,10,device='cuda') succeeds, including FP16
- Fails in both FP32 and FP16 — the error is not related to
.half() precision
- Fails with transformers 4.57.6 and 5.1.0 — not a transformers regression
- Specific to PyTorch cu130 — PyTorch 2.8.0+cu128 on the same GPU works perfectly
- Upgrading the system cuBLAS from 13.0.2.14 to 13.3.0.5 (via CUDA 13.2 container) did not help, since PyTorch cu130 links against its compiled cuBLAS paths
Workaround
Use PyTorch cu128 wheels instead of cu130:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
This is likely an upstream PyTorch bug with cuBLAS on Blackwell for certain tensor shapes used by DeBERTa, but documenting here since GLiNER users with Blackwell GPUs will hit this.
Description
GLiNER inference fails with
CUBLAS_STATUS_NOT_INITIALIZEDon NVIDIA Blackwell architecture GPUs (compute capability 12.0) when using PyTorch 2.10.0+cu130.The error occurs in DeBERTa v2's
F.linearcall during the encoder forward pass — specifically incublasLtMatmulAlgoGetHeuristic. Model loading succeeds; only inference triggers the error.Environment
https://download.pytorch.org/whl/cu130)Reproduction
Error
Full traceback points to:
Key findings
torch.randn(10,10,device='cuda') @ torch.randn(10,10,device='cuda')succeeds, including FP16.half()precisionWorkaround
Use PyTorch cu128 wheels instead of cu130:
This is likely an upstream PyTorch bug with cuBLAS on Blackwell for certain tensor shapes used by DeBERTa, but documenting here since GLiNER users with Blackwell GPUs will hit this.