Add memoryview_scoped_release to manage short-lived memoryviews #2307

virtuald · 2020-07-18T04:22:50Z

memoryviews seemed useful except for the potential to access an invalid buffer. Turns out that there's builtin support to avoid that sort of thing -- though oddly enough there doesn't seem to be an explicit CAPI function for it?

Example of an error encountered after the buffer is released.. better than a segfault?

Traceback (most recent call last):
  File "t.py", line 19, in <module>
    print(vv[0])
ValueError: operation forbidden on released memoryview object

bstaletic

I won't approve this, because I don't think I know enough about memory views at this point.

Do we want to make this release thing a part of py::memoryview's destructor?

bstaletic · 2021-02-14T08:45:11Z

include/pybind11/pybind11.h

+class memoryview_scoped_release {
+public:
+    explicit memoryview_scoped_release(memoryview view) : m_view(view) {}
+    ~memoryview_scoped_release() { m_view.attr("release")(); }


Okay, CPython implementation is in 256 different files. To save everyone the trouble, attr("release") is defined here: https://github.com/python/cpython/blob/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Objects/memoryobject.c#L1068

virtuald · 2021-02-14T18:03:26Z

I don't think you want to make release part of the destructor. Release indicates that the buffer is no longer valid -- so for example, a memoryview around a static buffer wouldn't need release to be called. Also would break existing users of py::memoryview.

This mechanism is intended to be used when you know when the memoryview is no longer pointing to a valid piece of memory and shouldn't be used by python anymore (and in particular -- python could still have references to the memoryview lying around).

bstaletic · 2021-02-14T18:55:22Z

Okay, that makes a lot of sense. I did say I'm not completely familiar with memory views.

virtuald · 2021-07-26T13:50:32Z

I'm sad this didn't make it into 2.7.0, but still interested in getting this merged!

Skylion007

This looks good to me and it's be nice to see this feature merged.

rwgk · 2021-08-05T08:57:08Z

We could simply add a member function like below (tested).
It just changes where the function call is placed. One way or the other it's not automatic.
How common is it to have complex logic where multiple view.release() are needed? Common enough that we need to add a dedicated type + documentation for that specific purpose?
I'm thinking the simpler approach is a more measured way to cater to real word use cases. Adding a method doesn't buy all that much, too, but is a nice inexpensive way to raise visibility.

diff --git a/include/pybind11/pytypes.h b/include/pybind11/pytypes.h
index c7b2501f..567c9d3d 100644
--- a/include/pybind11/pytypes.h
+++ b/include/pybind11/pytypes.h
@@ -1601,6 +1601,8 @@ public:
         return memoryview::from_memory(const_cast<void*>(mem), size, true);
     }
 #endif
+
+    void release() { attr("release")(); }
 };
 
 #ifndef DOXYGEN_SHOULD_SKIP_THIS
diff --git a/tests/test_pytypes.cpp b/tests/test_pytypes.cpp
index d70536d3..1df0d264 100644
--- a/tests/test_pytypes.cpp
+++ b/tests/test_pytypes.cpp
@@ -410,6 +410,15 @@ TEST_SUBMODULE(pytypes, m) {
     });
 #endif
 
+#if PY_VERSION_HEX >= 0x03020000
+    m.def("test_memoryview_release", [](const py::function &f) {
+        const char* buf = "\x42";
+        auto view = py::memoryview::from_memory(buf, 1);
+        f(view);
+        view.release();
+    });
+#endif
+
     // test_builtin_functions
     m.def("get_len", [](py::handle h) { return py::len(h); });
 
diff --git a/tests/test_pytypes.py b/tests/test_pytypes.py
index 66d6d30a..63c5ec92 100644
--- a/tests/test_pytypes.py
+++ b/tests/test_pytypes.py
@@ -460,6 +460,20 @@ def test_memoryview(method, args, fmt, expected_view):
     assert view_as_list == list(expected_view)
 
 
+@pytest.mark.skipif(sys.version_info < (3, 2), reason="API not available")
+def test_memoryview_release():
+    class C:
+        def fn(self, view):
+            self.view = view
+            assert bytes(view) == b"\x42"
+
+    c = C()
+    m.test_memoryview_release(c.fn)
+    assert hasattr(c, "view")
+    with pytest.raises(ValueError):
+        bytes(c.view)
+
+
 @pytest.mark.xfail("env.PYPY", reason="getrefcount is not available")
 @pytest.mark.parametrize(
     "method",

virtuald · 2021-08-09T18:17:09Z

I think it would make sense to add the release method in addition to (but not instead of) the RAII wrapper.

The primary reason I created this was for instances where I had a short-lived buffer passed to a python callback that I didn't want users in python code to accidentally retain. If they did accidentally retain the buffer, diagnosing a python exception is generally much simpler than diagnosing a segfault. Having an RAII object is useful for all the reasons that RAII is generally useful: ensures that buffer has correct behavior in case of exceptions, multiple returns, etc etc.

Any usage of py::memoryview wrapping a short lived buffer that doesn't use the RAII object is a nasty bug waiting to happen, in my opinion.

rwgk · 2021-08-09T18:33:38Z

RAII wrapper.

There is no Resource Acquisition in memoryview_scoped_release, only release. This is only half the RAII pattern. The extra class would make a difference only if you have complex logic similar to what you often see in Python C API code, with multiple DECREF for the same PyObject, or a goto to avoid that, etc. Clearly your solution is much better than that, but how often do you think this will actually be beneficial for buffer releases? I'm thinking it'll be pretty rare. I.e. in my judgement the case for "pro" is weak.

But there is the extra class in the pybind11 code. It's not easy to see what it's for, it takes a small effort to understand and to realize it's only rarely useful. In my judgement that's a "con": more often distracting than useful.

Therefore I feel very reluctant adding the class. If other maintainers feel differently, I'll be fine with it.

virtuald · 2021-08-09T22:40:55Z

Fine, there's no acquisition here, so it's not RAII. But it's a scope management object, which is still a really useful thing. This is the easiest correct way to release the memoryview in situations where that is required.

Given that memoryviews are fairly esoteric, I don't see "this isn't used that often" as a particularly convincing argument. There's not enough data to say.

If this was a golang library, I could see "don't include in the library something trivially implementable" as an acceptable argument. This is python, and python libraries tend to include all of the batteries. ;)

The memoryview documentation says:

The context management protocol can be used for a similar effect, using the with statement
>>> with memoryview(b'abc') as m:
...     m[0]

Perhaps the better way to do this would be some general context management object that emulates the with statement using an RAII object? Then you could do something like....

auto [scoped, as] = py::with(py::memoryview(buffer, size));

Or this, which seems worse.

auto scope = py::with(py::memoryview(buffer, size));
auto m = scope.as<memoryview>();

There are probably other places where calling __enter__/__exit__ automatically would be useful, particularly with regards to ensuring cleanup occurs while handling exceptions?

If a py::with existed, I would still put a note in the memoryview documentation about using that for short-lived buffers.

Skylion007 · 2021-08-09T23:06:01Z

Fine, there's no acquisition here, so it's not RAII. But it's a scope management object, which is still a really useful thing. This is the easiest correct way to release the memoryview in situations where that is required.

Given that memoryviews are fairly esoteric, I don't see "this isn't used that often" as a particularly convincing argument. There's not enough data to say.

If this was a golang library, I could see "don't include in the library something trivially implementable" as an acceptable argument. This is python, and python libraries tend to include all of the batteries. ;)

The memoryview documentation says:
The context management protocol can be used for a similar effect, using the with statement
>>> with memoryview(b'abc') as m:
...     m[0]
Perhaps the better way to do this would be some general context management object that emulates the with statement using an RAII object? Then you could do something like....
auto [scoped, as] = py::with(py::memoryview(buffer, size));
Or this, which seems worse.
auto scope = py::with(py::memoryview(buffer, size));
auto m = scope.as<memoryview>();
There are probably other places where calling __enter__/__exit__ automatically would be useful, particularly with regards to ensuring cleanup occurs while handling exceptions?

If a py::with existed, I would still put a note in the memoryview documentation about using that for short-lived buffers.

py::with just sounds like contextlib.closing? Why not just use that, but within C++.

rwgk · 2021-08-10T12:21:32Z

Fine, there's no acquisition here, so it's not RAII.

If you can find an elegant approach to full RAII, that doesn't duplicate code too much, I'd find this more convincing.

Caveat: I'm still having strong "of all the problems in the world ..." doubts, but I'll look again if you succeed with a full RAII implementation.

virtuald · 2021-08-10T13:51:52Z

py::with just sounds like contextlib.closing? Why not just use that, but within C++.

I don't think pybind11 supports creating python generators? Maybe with an eval or something. You would need to implement a C++ version of contextlib.closing that does the same thing -- I guess it could be called py::closing?

If you can find an elegant approach to full RAII, that doesn't duplicate code too much, I'd find this more convincing.

What about:

py::scoped_memoryview m(buffer, sz);

It could probably inherit from py::memoryview, but disable copies/moves to ensure the scope behavior is retained? Haven't tried it yet.

eirrgang · 2022-02-28T19:24:52Z

This is not "scoped". This just forcibly completes the Python Buffer Protocol for the session associated with the memoryview instance. PyBuffer_Release would normally be called when the reference count to the memoryview goes to zero, but memoryview.release() is convenient to support semantics like the context manager, in which you would expect the buffer to be closed even though the Python interpreter retains a reference to the memoryview object, itself.

It is within the scope of standard memoryview usage to force the buffer to be released before finalization, but that can be accomplished with a single line calling the Python object's release method.

The usage that is implied here, though, is unsafe. It doesn't prevent a seg fault in the C++ code if the C++ code is not also managing the object that provided the buffer in the first place. Once release is called, the Py_buffer releases its reference to the exporting object, and it could be destroyed.

Only the developer (for a specific use case of this proposed patch) can know whether it is safe to use a pointer obtained from a memoryview after release has been called. There isn't a general way to make it safe.

Perhaps the better way to do this would be some general context management object that emulates the with statement using an RAII object? Then you could do something like....

Pybind already has this. If you write a function that accepts a pybind11::buffer argument and pass it an object supporting the Python Buffer Protocol, the Python Buffer Protocol is automatically managed.

If the Py_buffer already exists (such as for a memoryview object), then the scope of the buffer validity is already determined by the code that acquires and releases the Py_buffer, and the scope of the memory allocation is determined by the object that exported the Py_buffer.

eirrgang · 2022-02-28T19:41:13Z

What about: py::scoped_memoryview m(buffer, sz);

py::buffer is already scoped by expressing the get/release calls in RAII semantics. When instantiated from a Python object, memoryview essentially converts the scope of a new Py_buffer into the object lifetime of a Python memoryview object. memoryview.release() allows that to be short-circuited, but it would clearly have different utility and usage for memoryview objects created from existing Py_buffer s or from raw memory.

Maybe what you are asking for is some sort of update to the move semantics of py::buffer?

virtuald · 2022-02-28T22:08:09Z

@eirrgang

I think you're looking at this from the wrong angle. I don't care a whole lot about whether C++ is doing something to cause a segfault, but rather the goal is to prevent python from causing a segfault unexpectedly. python users are typically not going to be very good at debugging segfaults, but they know how to deal with exceptions.

memoryview.release() is convenient to support semantics like the context manager, in which you would expect the buffer to be closed even though the Python interpreter retains a reference to the memoryview object, itself.

Yes. That is exactly what I want.

The specific use case that this is useful for is a callback thing like this:

{
  char buffer[size];
  // do something with buffer to fill it
       
  auto view = py::memoryview::from_memory(buffer, size);
  py::memoryview_scoped_release release(view);

  some_python_function(view);
}

In that case, I created the python buffer object in the scope. The buffer is not valid after that scope exists. If the python code retains a reference to the memoryview (which it could for some reason), then it will segfault when python tries to access the buffer. The goal of this object is to ensure that release is called, so that when python tries to access the memoryview it gets an exception instead of a segfault.

PyBuffer_Release is not going to help if python retains a reference to the buffer.

Yes, one could just call memoryview release at the end of that scope. However, that wouldn't be exception safe, so you'll need to either create some kind of object whose destructor calls the release function (like this PR does), or do something more complicated.

eirrgang · 2022-03-01T11:32:20Z

@eirrgang

I think you're looking at this from the wrong angle. I don't care a whole lot about whether C++ is doing something to cause a segfault, but rather the goal is to prevent python from causing a segfault unexpectedly. python users are typically not going to be very good at debugging segfaults, but they know how to deal with exceptions.

Ah, I see, now.

My inclination is to expose the buffer protocol and provide stern warnings not to keep buffers open longer than needed (and make sure that the C++ code also keeps the owning object alive per the buffer protocol), but this certainly isn't always practical.

In similar cases, I've created new types so that I could clearly document the somewhat unusual behavior of a Python object that becomes unusable when the scope changes, and used a module-custom "ScopeError" exception to draw attention to the constraints on correct usage of the exported object. But the semantics are the same as using memoryview in this way.

If some_python_function is some sort of hook that you want to allow for your clients, then memoryview and memoryview.release() definitely represent an existing implementation of the pattern you want.

It might be clearer if written to clearly indicate that the new object is definitely tied to the lifetime (or period of validity) of the exposed object. The snippet above looks like a lock or guard, which allows a state to be toggled, but doesn't permanently transform another object. What about the following?

{
  char buffer[size];
  // do something with buffer to fill it
       
  auto handle = scoped_memoryview::from_memory(buffer, size);

  some_python_function(handle.view);
}

Maybe that's too big a change. I'm just thinking about how one commonly uses a name like lock, not unlock, for this sort of state manager. Calling this release really throws me. I guess I understand the discussion of py::with now.

It seems like the description and implementation of this might be better as documentation additions than as additions to the pybind utilities themselves, though. The discussion here includes some good points that might help readers of the section on py::buffer.

eirrgang · 2022-03-01T12:02:56Z

More generically, a C++ Finalizer class to wrap an object and bind a function to call when the Finalizer is destroyed could be a good general utility. For that, it would definitely be helpful to have a py::memoryview::release() member function at the C++ level so that the callback could rely on typing to ensure the method's availability.

virtuald force-pushed the memoryview-release branch from ef6b1a6 to 4e92ffd Compare July 18, 2020 04:50

virtuald mentioned this pull request Jul 23, 2020

Not clear how to expose existing C++ vector as numpy array #1042

Open

virtuald force-pushed the memoryview-release branch from 4e92ffd to 5671fa6 Compare October 17, 2020 06:07

github-actions bot added the docs Docs or GitHub info label Oct 17, 2020

virtuald force-pushed the memoryview-release branch from 5671fa6 to 74ab0d8 Compare November 15, 2020 06:31

virtuald force-pushed the memoryview-release branch from 74ab0d8 to 5814f9a Compare January 1, 2021 21:00

virtuald force-pushed the memoryview-release branch from 5814f9a to 3cf0b08 Compare February 13, 2021 23:44

bstaletic reviewed Feb 14, 2021

View reviewed changes

virtuald force-pushed the memoryview-release branch from 3cf0b08 to 1dfb360 Compare July 26, 2021 13:50

virtuald force-pushed the memoryview-release branch 3 times, most recently from 300e623 to 3c1946f Compare July 26, 2021 14:13

Add memoryview_scoped_release to manage short-lived memoryviews

8276c0e

virtuald force-pushed the memoryview-release branch from 3c1946f to 8276c0e Compare July 26, 2021 14:21

Skylion007 requested review from henryiii and rwgk July 26, 2021 17:48

Skylion007 approved these changes Jul 26, 2021

View reviewed changes

rwgk mentioned this pull request Feb 10, 2023

FWD pybind11 google/pybind11clif#2307

Closed

Add memoryview_scoped_release to manage short-lived memoryviews #2307

Are you sure you want to change the base?

Add memoryview_scoped_release to manage short-lived memoryviews #2307

Uh oh!

Conversation

virtuald commented Jul 18, 2020

Uh oh!

bstaletic left a comment

Choose a reason for hiding this comment

Uh oh!

bstaletic Feb 14, 2021

Choose a reason for hiding this comment

Uh oh!

virtuald commented Feb 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bstaletic commented Feb 14, 2021

Uh oh!

virtuald commented Jul 26, 2021

Uh oh!

Skylion007 left a comment

Choose a reason for hiding this comment

Uh oh!

rwgk commented Aug 5, 2021

Uh oh!

virtuald commented Aug 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rwgk commented Aug 9, 2021

Uh oh!

virtuald commented Aug 9, 2021

Uh oh!

Skylion007 commented Aug 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rwgk commented Aug 10, 2021

Uh oh!

virtuald commented Aug 10, 2021

Uh oh!

eirrgang commented Feb 28, 2022

Uh oh!

eirrgang commented Feb 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

virtuald commented Feb 28, 2022

Uh oh!

eirrgang commented Mar 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eirrgang commented Mar 1, 2022

Uh oh!

Uh oh!

virtuald commented Feb 14, 2021 •

edited

Loading

virtuald commented Aug 9, 2021 •

edited

Loading

Skylion007 commented Aug 9, 2021 •

edited

Loading

eirrgang commented Feb 28, 2022 •

edited

Loading

eirrgang commented Mar 1, 2022 •

edited

Loading