Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memoryview_scoped_release to manage short-lived memoryviews #2307

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

virtuald
Copy link
Contributor

memoryviews seemed useful except for the potential to access an invalid buffer. Turns out that there's builtin support to avoid that sort of thing -- though oddly enough there doesn't seem to be an explicit CAPI function for it?

Example of an error encountered after the buffer is released.. better than a segfault?

Traceback (most recent call last):
  File "t.py", line 19, in <module>
    print(vv[0])
ValueError: operation forbidden on released memoryview object

Copy link
Collaborator

@bstaletic bstaletic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't approve this, because I don't think I know enough about memory views at this point.

Do we want to make this release thing a part of py::memoryview's destructor?

class memoryview_scoped_release {
public:
explicit memoryview_scoped_release(memoryview view) : m_view(view) {}
~memoryview_scoped_release() { m_view.attr("release")(); }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, CPython implementation is in 256 different files. To save everyone the trouble, attr("release") is defined here: https://github.com/python/cpython/blob/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Objects/memoryobject.c#L1068

@virtuald
Copy link
Contributor Author

virtuald commented Feb 14, 2021

I don't think you want to make release part of the destructor. Release indicates that the buffer is no longer valid -- so for example, a memoryview around a static buffer wouldn't need release to be called. Also would break existing users of py::memoryview.

This mechanism is intended to be used when you know when the memoryview is no longer pointing to a valid piece of memory and shouldn't be used by python anymore (and in particular -- python could still have references to the memoryview lying around).

@bstaletic
Copy link
Collaborator

Okay, that makes a lot of sense. I did say I'm not completely familiar with memory views.

@virtuald
Copy link
Contributor Author

I'm sad this didn't make it into 2.7.0, but still interested in getting this merged!

@virtuald virtuald force-pushed the memoryview-release branch 3 times, most recently from 300e623 to 3c1946f Compare July 26, 2021 14:13
Copy link
Collaborator

@Skylion007 Skylion007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me and it's be nice to see this feature merged.

@rwgk
Copy link
Collaborator

rwgk commented Aug 5, 2021

We could simply add a member function like below (tested).
It just changes where the function call is placed. One way or the other it's not automatic.
How common is it to have complex logic where multiple view.release() are needed? Common enough that we need to add a dedicated type + documentation for that specific purpose?
I'm thinking the simpler approach is a more measured way to cater to real word use cases. Adding a method doesn't buy all that much, too, but is a nice inexpensive way to raise visibility.

diff --git a/include/pybind11/pytypes.h b/include/pybind11/pytypes.h
index c7b2501f..567c9d3d 100644
--- a/include/pybind11/pytypes.h
+++ b/include/pybind11/pytypes.h
@@ -1601,6 +1601,8 @@ public:
         return memoryview::from_memory(const_cast<void*>(mem), size, true);
     }
 #endif
+
+    void release() { attr("release")(); }
 };
 
 #ifndef DOXYGEN_SHOULD_SKIP_THIS
diff --git a/tests/test_pytypes.cpp b/tests/test_pytypes.cpp
index d70536d3..1df0d264 100644
--- a/tests/test_pytypes.cpp
+++ b/tests/test_pytypes.cpp
@@ -410,6 +410,15 @@ TEST_SUBMODULE(pytypes, m) {
     });
 #endif
 
+#if PY_VERSION_HEX >= 0x03020000
+    m.def("test_memoryview_release", [](const py::function &f) {
+        const char* buf = "\x42";
+        auto view = py::memoryview::from_memory(buf, 1);
+        f(view);
+        view.release();
+    });
+#endif
+
     // test_builtin_functions
     m.def("get_len", [](py::handle h) { return py::len(h); });
 
diff --git a/tests/test_pytypes.py b/tests/test_pytypes.py
index 66d6d30a..63c5ec92 100644
--- a/tests/test_pytypes.py
+++ b/tests/test_pytypes.py
@@ -460,6 +460,20 @@ def test_memoryview(method, args, fmt, expected_view):
     assert view_as_list == list(expected_view)
 
 
+@pytest.mark.skipif(sys.version_info < (3, 2), reason="API not available")
+def test_memoryview_release():
+    class C:
+        def fn(self, view):
+            self.view = view
+            assert bytes(view) == b"\x42"
+
+    c = C()
+    m.test_memoryview_release(c.fn)
+    assert hasattr(c, "view")
+    with pytest.raises(ValueError):
+        bytes(c.view)
+
+
 @pytest.mark.xfail("env.PYPY", reason="getrefcount is not available")
 @pytest.mark.parametrize(
     "method",

@virtuald
Copy link
Contributor Author

virtuald commented Aug 9, 2021

I think it would make sense to add the release method in addition to (but not instead of) the RAII wrapper.

The primary reason I created this was for instances where I had a short-lived buffer passed to a python callback that I didn't want users in python code to accidentally retain. If they did accidentally retain the buffer, diagnosing a python exception is generally much simpler than diagnosing a segfault. Having an RAII object is useful for all the reasons that RAII is generally useful: ensures that buffer has correct behavior in case of exceptions, multiple returns, etc etc.

Any usage of py::memoryview wrapping a short lived buffer that doesn't use the RAII object is a nasty bug waiting to happen, in my opinion.

@rwgk
Copy link
Collaborator

rwgk commented Aug 9, 2021

RAII wrapper.

There is no Resource Acquisition in memoryview_scoped_release, only release. This is only half the RAII pattern. The extra class would make a difference only if you have complex logic similar to what you often see in Python C API code, with multiple DECREF for the same PyObject, or a goto to avoid that, etc. Clearly your solution is much better than that, but how often do you think this will actually be beneficial for buffer releases? I'm thinking it'll be pretty rare. I.e. in my judgement the case for "pro" is weak.

But there is the extra class in the pybind11 code. It's not easy to see what it's for, it takes a small effort to understand and to realize it's only rarely useful. In my judgement that's a "con": more often distracting than useful.

Therefore I feel very reluctant adding the class. If other maintainers feel differently, I'll be fine with it.

@virtuald
Copy link
Contributor Author

virtuald commented Aug 9, 2021

Fine, there's no acquisition here, so it's not RAII. But it's a scope management object, which is still a really useful thing. This is the easiest correct way to release the memoryview in situations where that is required.

Given that memoryviews are fairly esoteric, I don't see "this isn't used that often" as a particularly convincing argument. There's not enough data to say.

If this was a golang library, I could see "don't include in the library something trivially implementable" as an acceptable argument. This is python, and python libraries tend to include all of the batteries. ;)


The memoryview documentation says:

The context management protocol can be used for a similar effect, using the with statement

>>> with memoryview(b'abc') as m:
...     m[0]

Perhaps the better way to do this would be some general context management object that emulates the with statement using an RAII object? Then you could do something like....

auto [scoped, as] = py::with(py::memoryview(buffer, size));

Or this, which seems worse.

auto scope = py::with(py::memoryview(buffer, size));
auto m = scope.as<memoryview>();

There are probably other places where calling __enter__/__exit__ automatically would be useful, particularly with regards to ensuring cleanup occurs while handling exceptions?

If a py::with existed, I would still put a note in the memoryview documentation about using that for short-lived buffers.

@Skylion007
Copy link
Collaborator

Skylion007 commented Aug 9, 2021

Fine, there's no acquisition here, so it's not RAII. But it's a scope management object, which is still a really useful thing. This is the easiest correct way to release the memoryview in situations where that is required.

Given that memoryviews are fairly esoteric, I don't see "this isn't used that often" as a particularly convincing argument. There's not enough data to say.

If this was a golang library, I could see "don't include in the library something trivially implementable" as an acceptable argument. This is python, and python libraries tend to include all of the batteries. ;)

The memoryview documentation says:

The context management protocol can be used for a similar effect, using the with statement

>>> with memoryview(b'abc') as m:
...     m[0]

Perhaps the better way to do this would be some general context management object that emulates the with statement using an RAII object? Then you could do something like....

auto [scoped, as] = py::with(py::memoryview(buffer, size));

Or this, which seems worse.

auto scope = py::with(py::memoryview(buffer, size));
auto m = scope.as<memoryview>();

There are probably other places where calling __enter__/__exit__ automatically would be useful, particularly with regards to ensuring cleanup occurs while handling exceptions?

If a py::with existed, I would still put a note in the memoryview documentation about using that for short-lived buffers.

py::with just sounds like contextlib.closing? Why not just use that, but within C++.

@rwgk
Copy link
Collaborator

rwgk commented Aug 10, 2021

Fine, there's no acquisition here, so it's not RAII.

If you can find an elegant approach to full RAII, that doesn't duplicate code too much, I'd find this more convincing.

Caveat: I'm still having strong "of all the problems in the world ..." doubts, but I'll look again if you succeed with a full RAII implementation.

@virtuald
Copy link
Contributor Author

py::with just sounds like contextlib.closing? Why not just use that, but within C++.

I don't think pybind11 supports creating python generators? Maybe with an eval or something. You would need to implement a C++ version of contextlib.closing that does the same thing -- I guess it could be called py::closing?


If you can find an elegant approach to full RAII, that doesn't duplicate code too much, I'd find this more convincing.

What about:

py::scoped_memoryview m(buffer, sz);

It could probably inherit from py::memoryview, but disable copies/moves to ensure the scope behavior is retained? Haven't tried it yet.

@eirrgang
Copy link
Contributor

This is not "scoped". This just forcibly completes the Python Buffer Protocol for the session associated with the memoryview instance. PyBuffer_Release would normally be called when the reference count to the memoryview goes to zero, but memoryview.release() is convenient to support semantics like the context manager, in which you would expect the buffer to be closed even though the Python interpreter retains a reference to the memoryview object, itself.

It is within the scope of standard memoryview usage to force the buffer to be released before finalization, but that can be accomplished with a single line calling the Python object's release method.

The usage that is implied here, though, is unsafe. It doesn't prevent a seg fault in the C++ code if the C++ code is not also managing the object that provided the buffer in the first place. Once release is called, the Py_buffer releases its reference to the exporting object, and it could be destroyed.

Only the developer (for a specific use case of this proposed patch) can know whether it is safe to use a pointer obtained from a memoryview after release has been called. There isn't a general way to make it safe.

Perhaps the better way to do this would be some general context management object that emulates the with statement using an RAII object? Then you could do something like....

Pybind already has this. If you write a function that accepts a pybind11::buffer argument and pass it an object supporting the Python Buffer Protocol, the Python Buffer Protocol is automatically managed.

If the Py_buffer already exists (such as for a memoryview object), then the scope of the buffer validity is already determined by the code that acquires and releases the Py_buffer, and the scope of the memory allocation is determined by the object that exported the Py_buffer.

@eirrgang
Copy link
Contributor

eirrgang commented Feb 28, 2022

What about: py::scoped_memoryview m(buffer, sz);

py::buffer is already scoped by expressing the get/release calls in RAII semantics. When instantiated from a Python object, memoryview essentially converts the scope of a new Py_buffer into the object lifetime of a Python memoryview object. memoryview.release() allows that to be short-circuited, but it would clearly have different utility and usage for memoryview objects created from existing Py_buffer s or from raw memory.

Maybe what you are asking for is some sort of update to the move semantics of py::buffer?

@virtuald
Copy link
Contributor Author

@eirrgang

I think you're looking at this from the wrong angle. I don't care a whole lot about whether C++ is doing something to cause a segfault, but rather the goal is to prevent python from causing a segfault unexpectedly. python users are typically not going to be very good at debugging segfaults, but they know how to deal with exceptions.

memoryview.release() is convenient to support semantics like the context manager, in which you would expect the buffer to be closed even though the Python interpreter retains a reference to the memoryview object, itself.

Yes. That is exactly what I want.

The specific use case that this is useful for is a callback thing like this:

{
  char buffer[size];
  // do something with buffer to fill it
       
  auto view = py::memoryview::from_memory(buffer, size);
  py::memoryview_scoped_release release(view);

  some_python_function(view);
}

In that case, I created the python buffer object in the scope. The buffer is not valid after that scope exists. If the python code retains a reference to the memoryview (which it could for some reason), then it will segfault when python tries to access the buffer. The goal of this object is to ensure that release is called, so that when python tries to access the memoryview it gets an exception instead of a segfault.

PyBuffer_Release is not going to help if python retains a reference to the buffer.

Yes, one could just call memoryview release at the end of that scope. However, that wouldn't be exception safe, so you'll need to either create some kind of object whose destructor calls the release function (like this PR does), or do something more complicated.

@eirrgang
Copy link
Contributor

eirrgang commented Mar 1, 2022

@eirrgang

I think you're looking at this from the wrong angle. I don't care a whole lot about whether C++ is doing something to cause a segfault, but rather the goal is to prevent python from causing a segfault unexpectedly. python users are typically not going to be very good at debugging segfaults, but they know how to deal with exceptions.

Ah, I see, now.

My inclination is to expose the buffer protocol and provide stern warnings not to keep buffers open longer than needed (and make sure that the C++ code also keeps the owning object alive per the buffer protocol), but this certainly isn't always practical.

In similar cases, I've created new types so that I could clearly document the somewhat unusual behavior of a Python object that becomes unusable when the scope changes, and used a module-custom "ScopeError" exception to draw attention to the constraints on correct usage of the exported object. But the semantics are the same as using memoryview in this way.

If some_python_function is some sort of hook that you want to allow for your clients, then memoryview and memoryview.release() definitely represent an existing implementation of the pattern you want.

It might be clearer if written to clearly indicate that the new object is definitely tied to the lifetime (or period of validity) of the exposed object. The snippet above looks like a lock or guard, which allows a state to be toggled, but doesn't permanently transform another object. What about the following?

{
  char buffer[size];
  // do something with buffer to fill it
       
  auto handle = scoped_memoryview::from_memory(buffer, size);

  some_python_function(handle.view);
}

Maybe that's too big a change. I'm just thinking about how one commonly uses a name like lock, not unlock, for this sort of state manager. Calling this release really throws me. I guess I understand the discussion of py::with now.

It seems like the description and implementation of this might be better as documentation additions than as additions to the pybind utilities themselves, though. The discussion here includes some good points that might help readers of the section on py::buffer.

@eirrgang
Copy link
Contributor

eirrgang commented Mar 1, 2022

More generically, a C++ Finalizer class to wrap an object and bind a function to call when the Finalizer is destroyed could be a good general utility. For that, it would definitely be helpful to have a py::memoryview::release() member function at the C++ level so that the callback could rely on typing to ensure the method's availability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Docs or GitHub info
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants