Closed
Description
out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.fwd(
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [298,0,0], thread: [64,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [298,0,0], thread: [65,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.
Metadata
Assignees
Labels
No labels