Description
Hello,
I would like to use the triu! and transpose! functions on a non-contiguous view (eg. view(a', 1:2:6,4:2:8)) - is there a way make this possible (ideally for all functions in src/host/linalg.jl; and for copyto! in src/host/abstractarray) without severly increasing runtimes/compiletimes due to multiple-dispatch overhead?
Earlier discussions on this topic:
#452
#458
JuliaGPU/CUDA.jl#1778
JuliaGPU/CUDA.jl#2078
Perhaps some type of a union of subarrays. transposes, and abstractarrays (to avoid switching to AnyGPUArrays; also AnyGPUArrays does not include transposes) ?
Edit: I just saw IndexGPUArray might be an option, if it were expanded with SubArray{T, <:Any, <:LinearAlgebra.Adjoint{T, <:AbstractGPUArray }}
Let me know your thoughts and happy to draft a PR
@maleadt @vchuravy