Skip to content

Use "native" types when possible #319

Closed
@orenbenkiki

Description

@orenbenkiki

Problem

Currently when passing arrays between Python and Julia, then the code always uses a wrapper object (PyArray, ArrayValue, etc.), even if the data happens to be contiguous in memory. This is a problem because not all code works well (or at all) with these wrappers. In an ideal world with proper interfaces, this wouldn't have been a problem, but both Python and Julia are somewhat lax about strict interfaces for arrays, so in practice, things sometimes break (or just run slowly).

As just one trivial example, this will crash:

# Julia

function give_vector()::Vector
    return vec([1.0 2.0 3.0])
end
# Python

vector = in_julia.give_vector()
assert vector.__class__.__name__ == "VectorValue"
assert str(vector) == "[1.0, 2.0, 3.0]"
vector += 1

Because TypeError: unsupported operand type(s) for +=: 'VectorValue' and 'int'.

The point isn't about this specific missing operation (though fixing it would be nice); the point is that try as we may, we'll never make VectorValue be a 100% drop-in replacement to ndarray.

Solution

When converting arrays between Julia and Python, if the data is contiguous, then use frombuffer to wrap the memory with an ndarray for Python, or use jl_ptr_to_array to wrap the memory with an Array for Julia to use. If, however, the data is not contiguous in memory, keep the current behavior of returning a wrapper type.

Alternatives

This can be done manually of course, which is probably what I will be doing in my code for now. That said, even if this wasn't the default behavior, it would be nice to provide some convenience functions to make it easier to achieve.

Additional context

Here is some example code which worked for me to demonstrate the feasibility of the solution:

# Julia

function as_ndarray(array::Array)::PyArray
    np = pyimport("numpy")
    return PyArray(np.frombuffer(array))
end

function from_ndarray(ndarray::PyArray)::Array
    return unsafe_wrap(Array, pointer(ndarray), size(ndarray))
end

function give_numpy_vector()::PyVector
    return as_ndarray(give_vector())
end

function modify_vector(vector::AbstractVector)::Nothing
    vector[1] += 10
    return nothing
end

function modify_vector(vector::PyVector)::Nothing
    modify_vector(from_ndarray(vector))
    return nothing
end
# Python

vector = in_julia.give_vector()
assert vector.__class__.__name__ == "VectorValue"
assert str(vector) == "[1.0, 2.0, 3.0]"
in_julia.modify_vector(vector)
assert str(vector) == "[11.0, 2.0, 3.0]"

vector = in_julia.give_numpy_vector()
assert vector.__class__.__name__ == "ndarray"
assert str(vector) == "[1. 2. 3.]"
in_julia.modify_vector(vector)
assert str(vector) == "[11.  2.   3.]"

Naturally this is just a proof of concept and doesn't deal with issues such as proper mapping of dtype, testing whether the data is actually contiguous, etc.

One point woth noting is that the fact that Array "likes" to be column-major and ndarray "likes" to be row-major is not a show stopper here. I have plenty of Python code which explicitly works with column-major arrays (because that's what needed for efficiency), and Julia has PermutedDimsArray which can likewise deal with row-major order. It is always the developer's responsibility to deal with this issue (summing rows of column-major data will be much slower than summing rows of the same data in rows-major order).

As a side note, the built-in implementation in both numpy and Julia for converting data between these layouts is slow as molasses for no good reason, so I had to provide my own implementation to get reasonable performance. But that's mostly irrelevant to the issue I raised here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions