Description
Problem
Currently when passing arrays between Python and Julia, then the code always uses a wrapper object (PyArray
, ArrayValue
, etc.), even if the data happens to be contiguous in memory. This is a problem because not all code works well (or at all) with these wrappers. In an ideal world with proper interfaces, this wouldn't have been a problem, but both Python and Julia are somewhat lax about strict interfaces for arrays, so in practice, things sometimes break (or just run slowly).
As just one trivial example, this will crash:
# Julia
function give_vector()::Vector
return vec([1.0 2.0 3.0])
end
# Python
vector = in_julia.give_vector()
assert vector.__class__.__name__ == "VectorValue"
assert str(vector) == "[1.0, 2.0, 3.0]"
vector += 1
Because TypeError: unsupported operand type(s) for +=: 'VectorValue' and 'int'
.
The point isn't about this specific missing operation (though fixing it would be nice); the point is that try as we may, we'll never make VectorValue
be a 100% drop-in replacement to ndarray
.
Solution
When converting arrays between Julia and Python, if the data is contiguous, then use frombuffer
to wrap the memory with an ndarray
for Python, or use jl_ptr_to_array
to wrap the memory with an Array
for Julia to use. If, however, the data is not contiguous in memory, keep the current behavior of returning a wrapper type.
Alternatives
This can be done manually of course, which is probably what I will be doing in my code for now. That said, even if this wasn't the default behavior, it would be nice to provide some convenience functions to make it easier to achieve.
Additional context
Here is some example code which worked for me to demonstrate the feasibility of the solution:
# Julia
function as_ndarray(array::Array)::PyArray
np = pyimport("numpy")
return PyArray(np.frombuffer(array))
end
function from_ndarray(ndarray::PyArray)::Array
return unsafe_wrap(Array, pointer(ndarray), size(ndarray))
end
function give_numpy_vector()::PyVector
return as_ndarray(give_vector())
end
function modify_vector(vector::AbstractVector)::Nothing
vector[1] += 10
return nothing
end
function modify_vector(vector::PyVector)::Nothing
modify_vector(from_ndarray(vector))
return nothing
end
# Python
vector = in_julia.give_vector()
assert vector.__class__.__name__ == "VectorValue"
assert str(vector) == "[1.0, 2.0, 3.0]"
in_julia.modify_vector(vector)
assert str(vector) == "[11.0, 2.0, 3.0]"
vector = in_julia.give_numpy_vector()
assert vector.__class__.__name__ == "ndarray"
assert str(vector) == "[1. 2. 3.]"
in_julia.modify_vector(vector)
assert str(vector) == "[11. 2. 3.]"
Naturally this is just a proof of concept and doesn't deal with issues such as proper mapping of dtype
, testing whether the data is actually contiguous, etc.
One point woth noting is that the fact that Array
"likes" to be column-major and ndarray
"likes" to be row-major is not a show stopper here. I have plenty of Python code which explicitly works with column-major arrays (because that's what needed for efficiency), and Julia has PermutedDimsArray
which can likewise deal with row-major order. It is always the developer's responsibility to deal with this issue (summing rows of column-major data will be much slower than summing rows of the same data in rows-major order).
As a side note, the built-in implementation in both numpy
and Julia for converting data between these layouts is slow as molasses for no good reason, so I had to provide my own implementation to get reasonable performance. But that's mostly irrelevant to the issue I raised here.