Closed
Description
We should probably implement the CUDA array interface for interoperability across deep learning frameworks:
- https://developer.nvidia.com/blog/machine-learning-frameworks-interoperability-part-1-memory-layouts-and-memory-pools/
- https://numba.pydata.org/numba-doc/dev/cuda/cuda_array_interface.html
As part of that, we'll need to a dd a strides
field to CuArray. This should make it possible to get rid of some more SubArray uses, as well as do some more advanced optimization like under-Peter/OMEinsum.jl#133.