Description of errors
The LU decomposition presented in the documentation is only valid for blocks of size 1x1.
Otherwise it is impossible to have L_1 I = B_1.
I tried to interface it in the Julia interface AMDGPU.jl and unable to recover the correct results with a block size > 2.
JuliaGPU/AMDGPU.jl#746