Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interface for making a PyArray non-writeable #456

Open
jakelishman opened this issue Oct 9, 2024 · 9 comments
Open

Interface for making a PyArray non-writeable #456

jakelishman opened this issue Oct 9, 2024 · 9 comments

Comments

@jakelishman
Copy link

jakelishman commented Oct 9, 2024

I'm using PyArray::from_borrowed_array_bound to expose an immutable data buffer from a Rust struct to Python space via Numpy. The type needs to uphold some invariants about its internal data, which means I want to prevent Numpy from allowing writes, so I'm trying to unset the WRITEABLE flag before returning the object from Rust space.

Currently, I'm doing something like this:

use pyo3::prelude::*;
use numpy::{prelude::*, PyArray1, npyffi};
use ndarray::aview1;

#[pyclass]
struct NonwriteableData(Box<[u8]>);

#[pymethods]
impl NonwriteableData {
    fn data(slf_: Bound<Self>) -> Bound<PyArray1<u8>> {
        let arr = aview1(&self.0);
        // This feels comfortably unsafe, and I'm upholding the requirements
        // to not re-allocate, drop, etc the underlying pointer from Rust.
        let out = unsafe {
            PyArray1::borrow_from_array_bound(&arr, slf_.into_any())
        };
        // This _feels_ like there's a safe API that can be made?
        unsafe {
            (*out.as_array_ptr()).flags &= !npyffi::flags::NPY_ARRAY_WRITEABLE;
        }
        out
    }
}

I'm not certain that I'm entirely correct in doing this in the first place. If I am, then my second unsafe block feels like it could have a safe abstraction wrapping it? If I understand correctly, if we can assert/verify that there are no existing PyReadwriteArrays taken out on the underlying memory in Rust space (which I haven't done explicitly here, since I only just created the array object), then removing the WRITEABLE flag should be a safe operation.

Please correct me if I've made soundness errors here. If I haven't, could we potentially add methods like nonwriteable / try_nonwriteable that unset this flag in a safe manner?

@jakelishman
Copy link
Author

jakelishman commented Oct 9, 2024

A minor follow-up: I think a method on PyReadwriteArray that consumes the mutable view object would be sufficient to satisfy all the safety concerns? So

impl<T, D> PyReadwriteArray<T, D> {
    pub fn make_nonwriteable(self) {
        // SAFETY: the array pointer is known to be non-null.  We are consuming the only
        // possible extant mutable view on the data, so cannot invalidate any accessible
        // mutable view.
        unsafe {
            (*self.as_array_ptr()).flags &= !npyffi::flags::NPY_ARRAY_WRITEABLE;
        }
    }
}

might work? Happy to PR and add a bit of documentation around it, if so.

@davidhewitt
Copy link
Member

We have a .readonly() method on PyArray, does that fit your need?

@jakelishman
Copy link
Author

I'm trying to return the ndarray as view object to Python space, and PyReadonlyArray doesn't (can't, I think?) implement ToPyObject since it can't be certain the lifetime would be bound correctly. It's Python space I want to forbid from modifying the resulting ndarray - in Numpy land that means ensuring that two flags are unset: the OWNSDATA flag, which borrow_from_array_bound handles, and WRITEABLE, which I couldn't spot a way to do in the rust-numpy API without making this pointer dereference.

I could copy the data into a Numpy array on access, or write a custom type that implements much of the array interface I want to provide users easy access to (it's all the magic operator methods in particular, since the ufunc protocols are easy enough), I'm just trying to reduce runtime / boilerplate code.

@davidhewitt
Copy link
Member

It looks to me like PyReadonlyArray implements Deref to Bound<'py, PyArray>, which does implement ToPyObject, so that seems like a straight-up missing API?

@davidhewitt
Copy link
Member

You can presumably also do something like (*readonly_array).to_object() to avoid the need to wait for a release.

@jakelishman
Copy link
Author

Sorry if I'm missing some of your point here - I'm not trying to enforce immutability within Rust, I'm trying to enforce it from Python. I can't allow Python space to write into the data buffer.

I can pass underlying PyArray that the PyReadonlyArray is a borrow of back to Python space, but unless I unset the WRITEABLE flag in the struct (in Numpy C API, that's PyArray_CLEARFLAGS(my_array, NPY_ARRAY_WRITEABLE)), Numpy will allow writes to the array from Python. Derefing PyReadonlyArray back to PyArray doesn't set the flag (nor should it - that'd prevent any further PyReadwriteArrays from being taken out). It's possible that PyReadonlyArray could be given a ToPyObject impl, but that could be misleading; returning a Rust-space immutable borrow to Python probably oughtn't to have the side-effect of flipping the underlying Python object to immutability. If a user ever called PyO3 functions passing a PyReadonlyArray as one of their arguments, they might suddenly get runtime borrow errors when attempting to use the same array mutably later.

Give or take, in pure Python, I'm doing something like this:

import numpy

class A:
    def __init__(self):
        # This is my data.  In Rust, it's some other slice type.
        self._buffer = b"\x00" * 256

    def view(self):
        # This wraps the data in an `ndarray` that forbids mutations.
        return numpy.frombuffer(self._buffer, dtype=numpy.uint8)

a = A()
assert not a.view().flags.writeable

# This will raise an ValueError because it's read-only.
a.view()[0] = 1

So the underlying data buffer isn't an ndarray, but I want to use ndarray to provide Numpy semantics on it, but only permit reading, not writing. Numpy's C API does have all the tools to do this - the Rust code I wrote up top is (if I got it right) effectively the correct way to do it, though the C-API has a helper macro. rust-numpy is already somewhat aware of the flag, because you can't take out a PyReadwriteArray on an ndarray with it set. I'm looking for a nice safe abstraction to set that flag on a PyArray.

@davidhewitt
Copy link
Member

Ah sorry, this is entirely my mistake causing the confusion. We used to set the flag, but after the borrow strategy was changed in #258 we stopped doing that.

In which case I think the API you propose above in make_nonwriteable seems like the right approach. I guess it should return a PyReadonlyArray?

Do we think there are any other alternatives for the name? I briefly thought about into_readonly but it seems to me like because this does a bit more than just the Rust borrow checking it makes sense to give it a more unusual name.

@jakelishman
Copy link
Author

Oh yeah, returning a PyReadonlyArray is certainly fine too. I wasn't thinking about it because I stop using the array from Rust space as soon as I set this flag.

For naming, I was somewhat avoiding "readonly" because of the clash with PyReadonlyArray (which is a different concept of readonly), and the Rust-space as_, to_ and into_ conventions because this a side-effect-y function; it modifies the properties of what you can do with the PyArray that the PyReadwriteArray is borrowed from, so its effects extend beyond the lifetime of the object it consumes.

I couldn't think if there's any other conventions in rust-numpy or PyO3 for that kind of "modification beyond the borrowed lifetime" implication. I can write up a PR, and we can play with the name in that, if anything comes to mind?

@davidhewitt
Copy link
Member

Sounds good to me, thanks 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants