Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make stl.h list|set|map_caster more user friendly. #4686

Merged
merged 55 commits into from
Aug 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
d2a36d9
Add `test_pass_std_vector_int()`, `test_pass_std_set_int()` in test_stl
May 29, 2023
7d30378
Change `list_caster` to also accept generator objects (`PyGen_Check(s…
May 30, 2023
039d1b7
Drop in (currently unpublished) PyCLIF code, use in `list_caster`, ad…
Jun 5, 2023
bdba857
Use `PyObjectTypeIsConvertibleToStdSet()` in `set_caster`, adjust tests.
Jun 5, 2023
ec50ca0
Use `PyObjectTypeIsConvertibleToStdMap()` in `map_caster`, add tests.
Jun 5, 2023
eeb59af
Simplify `list_caster` `load()` implementation, push str/bytes check …
Jun 5, 2023
3631886
clang-tidy cleanup with a few extra `(... != 0)` to be more consistent.
Jun 5, 2023
b4f767b
Also use `PyObjectTypeIsConvertibleToStdVector()` in `array_caster`.
Jun 7, 2023
42e22d7
Merge branch 'master' into list_caster_pass_generator
Jun 7, 2023
9647a92
Merge branch 'master' into list_caster_pass_generator
Jun 27, 2023
1ed90cf
Merge branch 'master' into list_caster_pass_generator
Jul 14, 2023
3cc034b
Update comment pointing to clif/python/runtime.cc (code is unchanged).
Jul 14, 2023
18c4153
Merge branch 'master' into list_caster_pass_generator
Jul 14, 2023
e7330f2
Comprehensive test coverage, enhanced set_caster load implementation.
Jul 14, 2023
da29bf5
Resolve clang-tidy eror.
Jul 15, 2023
1fa338e
Add a long C++ comment explaining what led to the `PyObjectTypeIsConv…
Jul 15, 2023
6b5c780
Minor function name change in test.
Jul 15, 2023
9150a88
Merge branch 'master' into list_caster_pass_generator
Jul 15, 2023
62fda81
Merge branch 'master' into list_caster_pass_generator
Jul 17, 2023
4b25f74
strcmp -> std::strcmp (thanks @Skylion007 for catching this)
Jul 17, 2023
3cfc95b
Add `PyCallable_Check(items)` in `PyObjectTypeIsConvertibleToStdMap()`
Jul 17, 2023
59509f2
Resolve clang-tidy error
Jul 17, 2023
903faac
Merge branch 'master' into list_caster_pass_generator
Jul 17, 2023
4ab40d7
Use `PyMapping_Items()` instead of `src.attr("items")()`, to be inter…
Jul 17, 2023
5f7c9d4
Update link to PyCLIF sources.
Jul 17, 2023
ccfeb02
Fix typo (thanks @wangxf123456 for catching this)
Jul 17, 2023
6c1ec6a
Merge branch 'master' into list_caster_pass_generator
Jul 23, 2023
cc4d69c
Merge branch 'master' into list_caster_pass_generator
Aug 4, 2023
dacb827
Add `test_pass_std_vector_int()`, `test_pass_std_set_int()` in test_stl
May 29, 2023
3fc2d2a
Change `list_caster` to also accept generator objects (`PyGen_Check(s…
May 30, 2023
ff0724f
Drop in (currently unpublished) PyCLIF code, use in `list_caster`, ad…
Jun 5, 2023
be02291
Use `PyObjectTypeIsConvertibleToStdSet()` in `set_caster`, adjust tests.
Jun 5, 2023
0a79f55
Use `PyObjectTypeIsConvertibleToStdMap()` in `map_caster`, add tests.
Jun 5, 2023
ae95424
Simplify `list_caster` `load()` implementation, push str/bytes check …
Jun 5, 2023
8aa81d7
clang-tidy cleanup with a few extra `(... != 0)` to be more consistent.
Jun 5, 2023
4435ed0
Also use `PyObjectTypeIsConvertibleToStdVector()` in `array_caster`.
Jun 7, 2023
9aca5e8
Update comment pointing to clif/python/runtime.cc (code is unchanged).
Jul 14, 2023
9dc7d7c
Comprehensive test coverage, enhanced set_caster load implementation.
Jul 14, 2023
d9eeea8
Resolve clang-tidy eror.
Jul 15, 2023
9ac319b
Add a long C++ comment explaining what led to the `PyObjectTypeIsConv…
Jul 15, 2023
b834767
Minor function name change in test.
Jul 15, 2023
d19a0ff
strcmp -> std::strcmp (thanks @Skylion007 for catching this)
Jul 17, 2023
69a0da8
Add `PyCallable_Check(items)` in `PyObjectTypeIsConvertibleToStdMap()`
Jul 17, 2023
c86fa97
Resolve clang-tidy error
Jul 17, 2023
940e190
Use `PyMapping_Items()` instead of `src.attr("items")()`, to be inter…
Jul 17, 2023
f7725e4
Update link to PyCLIF sources.
Jul 17, 2023
fa72918
Fix typo (thanks @wangxf123456 for catching this)
Jul 17, 2023
a6b9a00
Merge branch 'list_caster_pass_generator' of https://github.com/rwgk/…
Aug 31, 2023
ac3ac4d
Merge branch 'master' into list_caster_pass_generator
Sep 12, 2023
87ff92a
Merge branch 'master' into list_caster_pass_generator
Nov 16, 2023
3ae34f3
Fix typo discovered by new version of codespell.
Nov 16, 2023
d1ffe4f
Merge branch 'master' into list_caster_pass_generator
Jul 31, 2024
2267e5c
Merge branch 'master' into list_caster_pass_generator
Aug 10, 2024
a38d0bd
Merge branch 'master' into list_caster_pass_generator
Aug 13, 2024
fbf41fd
Merge branch 'master' into list_caster_pass_generator
Aug 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
210 changes: 177 additions & 33 deletions include/pybind11/stl.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#include "detail/common.h"

#include <deque>
#include <initializer_list>
#include <list>
#include <map>
#include <ostream>
Expand All @@ -35,6 +36,89 @@
PYBIND11_NAMESPACE_BEGIN(PYBIND11_NAMESPACE)
PYBIND11_NAMESPACE_BEGIN(detail)

//
// Begin: Equivalent of
// https://github.com/google/clif/blob/ae4eee1de07cdf115c0c9bf9fec9ff28efce6f6c/clif/python/runtime.cc#L388-L438
/*
The three `PyObjectTypeIsConvertibleTo*()` functions below are
the result of converging the behaviors of pybind11 and PyCLIF
(http://github.com/google/clif).

Originally PyCLIF was extremely far on the permissive side of the spectrum,
while pybind11 was very far on the strict side. Originally PyCLIF accepted any
Python iterable as input for a C++ `vector`/`set`/`map` argument, as long as
the elements were convertible. The obvious (in hindsight) problem was that
any empty Python iterable could be passed to any of these C++ types, e.g. `{}`
was accepted for C++ `vector`/`set` arguments, or `[]` for C++ `map` arguments.

The functions below strike a practical permissive-vs-strict compromise,
informed by tens of thousands of use cases in the wild. A main objective is
to prevent accidents and improve readability:

- Python literals must match the C++ types.

- For C++ `set`: The potentially reducing conversion from a Python sequence
(e.g. Python `list` or `tuple`) to a C++ `set` must be explicit, by going
through a Python `set`.

- However, a Python `set` can still be passed to a C++ `vector`. The rationale
is that this conversion is not reducing. Implicit conversions of this kind
are also fairly commonly used, therefore enforcing explicit conversions
would have an unfavorable cost : benefit ratio; more sloppily speaking,
such an enforcement would be more annoying than helpful.
*/

inline bool PyObjectIsInstanceWithOneOfTpNames(PyObject *obj,
std::initializer_list<const char *> tp_names) {
if (PyType_Check(obj)) {
return false;
}
const char *obj_tp_name = Py_TYPE(obj)->tp_name;
for (const auto *tp_name : tp_names) {
if (std::strcmp(obj_tp_name, tp_name) == 0) {
return true;
}
}
return false;
}

inline bool PyObjectTypeIsConvertibleToStdVector(PyObject *obj) {
if (PySequence_Check(obj) != 0) {
return !PyUnicode_Check(obj) && !PyBytes_Check(obj);
}
return (PyGen_Check(obj) != 0) || (PyAnySet_Check(obj) != 0)
|| PyObjectIsInstanceWithOneOfTpNames(
obj, {"dict_keys", "dict_values", "dict_items", "map", "zip"});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires explicit subclasses, rather than any object that conforms to collections.abc.Sequence? (Likewise for collections.abc.Mapping and collections.abc.Set below) The Pythonic way to do this would be to ask if the object has the methods required by these protocols, rather than checking explicit inheritance.

Copy link
Collaborator

@henryiii henryiii Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, from https://docs.python.org/3/library/collections.abc.html:

Sequence: __getitem__, __len__, __contains____iter____reversed__index, and count.

Set: __contains__, __iter__, __len__, __le__, __lt__, __eq____ne__, __gt____ge____and____or__, __sub____xor__, and isdisjoint.

Mapping: __getitem__, __iter__, __len__, __contains__keysitemsvalues, get__eq__, and __ne__.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to what I wrote yesterday, there was actually another concern in the back of my mind: the current implementation is super fast, direct strcmp calls for 5 (vector) of 1 (set) very short string(s).

In contrast, hasattr is far more complex and probably orders of magnitude slower, although I wouldn't really worry about it in the usual bigger context (it'll not even be a blip on the radar in most cases). The code extra code complexity is the bigger worry.

So there is a code complexity and runtime overhead cost.

What is the benefit?

That brings me back to my "large-scale field experiment" experience hinted in the PR description. In a nutshell:

  • Current pybind11 is super restrictive.

  • Original PyCLIF was super permissive.

I had this huge pile of data from the wild contrasting the two. Based on that I could see very clearly that there is very little to gain from being more permissive than what this PR implements. IOW I don't think paying the code complexity and runtime overhead cost will help in any significant way.

Could you please let me know if you have a different opinion? — It will probably cost me a whole day worth of effort to implement and fully test (global testing) attribute inspection. We will also have to agree on what subsets of the attributes we actually want to test against. I expect that there will be some devils in the details.

Copy link
Collaborator

@henryiii henryiii Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having a short path (check inheritance first) could alleviate much of the speed problem - the slower "correct" check could be done if this is not a subclass of X for common X. Also, this list is somewhat arbitrary - you probably included things (like dict_keys? dict_items?) based on usage (in PyCLIF), rather than making a comprehensive list of everything iterable in Python? Also, I believe checking the attributes on a class (you don't need to check them on the instance) is pretty fast if you avoid the descriptor protocol (typing.runtime_checkable switched to this in 3.12 (since 3.2) which made it much faster).

From a usage standpoint, implementing the Protocol specified by these ABC's is supposed to make your class work wherever the standard library classes work, so it seems ideal if this could follow that. Not required (and we could start with this if you really want to), but it seems like the correct thing to do.

We will also have to agree on what subsets of the attributes we actually want to test against.

This is concretely specified in collections.abc. No arguments needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having a short path (check inheritance first) could alleviate much of the speed problem

I agree. Which basically means: this PR plus follow-on work to make the check more permissive.

Do you think it's reasonable then to merge this PR, and do the fallback attribute inspection in a separate PR? — To be completely honest, I wouldn't want to prioritize this, for the reasons explained before. It a significant amount of work, extra code complexity, and I don't think very many people will ever notice the difference in practice. But if someone else wants to invest the effort, I'd be on board.

This is concretely specified in collections.abc. No arguments needed.

I see, sounds good to me, but from a purely practical viewpoint, assuming that most duck-typing only implements a subset of the attributes, insisting on the full set of attributes here probably means nobody will ever see a difference in practice, unless they are specifically looking to pass the checks here.

}

inline bool PyObjectTypeIsConvertibleToStdSet(PyObject *obj) {
return (PyAnySet_Check(obj) != 0) || PyObjectIsInstanceWithOneOfTpNames(obj, {"dict_keys"});
}

inline bool PyObjectTypeIsConvertibleToStdMap(PyObject *obj) {
if (PyDict_Check(obj)) {
return true;
}
// Implicit requirement in the conditions below:
// A type with `.__getitem__()` & `.items()` methods must implement these
// to be compatible with https://docs.python.org/3/c-api/mapping.html
if (PyMapping_Check(obj) == 0) {
return false;
}
PyObject *items = PyObject_GetAttrString(obj, "items");
if (items == nullptr) {
PyErr_Clear();
return false;
}
bool is_convertible = (PyCallable_Check(items) != 0);
Py_DECREF(items);
return is_convertible;
}

//
// End: Equivalent of clif/python/runtime.cc
//

/// Extracts an const lvalue reference or rvalue reference for U based on the type of T (e.g. for
/// forwarding a container element). Typically used indirect via forwarded_type(), below.
template <typename T, typename U>
Expand Down Expand Up @@ -66,24 +150,40 @@ struct set_caster {
}
void reserve_maybe(const anyset &, void *) {}

public:
bool load(handle src, bool convert) {
if (!isinstance<anyset>(src)) {
return false;
}
auto s = reinterpret_borrow<anyset>(src);
value.clear();
reserve_maybe(s, &value);
for (auto entry : s) {
bool convert_iterable(const iterable &itbl, bool convert) {
for (const auto &it : itbl) {
key_conv conv;
if (!conv.load(entry, convert)) {
if (!conv.load(it, convert)) {
return false;
}
value.insert(cast_op<Key &&>(std::move(conv)));
}
return true;
}

bool convert_anyset(anyset s, bool convert) {
value.clear();
reserve_maybe(s, &value);
return convert_iterable(s, convert);
}

public:
bool load(handle src, bool convert) {
if (!PyObjectTypeIsConvertibleToStdSet(src.ptr())) {
return false;
}
if (isinstance<anyset>(src)) {
value.clear();
return convert_anyset(reinterpret_borrow<anyset>(src), convert);
}
if (!convert) {
return false;
}
assert(isinstance<iterable>(src));
value.clear();
return convert_iterable(reinterpret_borrow<iterable>(src), convert);
}

template <typename T>
static handle cast(T &&src, return_value_policy policy, handle parent) {
if (!std::is_lvalue_reference<T>::value) {
Expand Down Expand Up @@ -115,15 +215,10 @@ struct map_caster {
}
void reserve_maybe(const dict &, void *) {}

public:
bool load(handle src, bool convert) {
if (!isinstance<dict>(src)) {
return false;
}
auto d = reinterpret_borrow<dict>(src);
bool convert_elements(const dict &d, bool convert) {
value.clear();
reserve_maybe(d, &value);
for (auto it : d) {
for (const auto &it : d) {
key_conv kconv;
value_conv vconv;
if (!kconv.load(it.first.ptr(), convert) || !vconv.load(it.second.ptr(), convert)) {
Expand All @@ -134,6 +229,25 @@ struct map_caster {
return true;
}

public:
bool load(handle src, bool convert) {
if (!PyObjectTypeIsConvertibleToStdMap(src.ptr())) {
return false;
}
if (isinstance<dict>(src)) {
return convert_elements(reinterpret_borrow<dict>(src), convert);
}
if (!convert) {
return false;
}
auto items = reinterpret_steal<object>(PyMapping_Items(src.ptr()));
if (!items) {
throw error_already_set();
}
assert(isinstance<iterable>(items));
return convert_elements(dict(reinterpret_borrow<iterable>(items)), convert);
}

template <typename T>
static handle cast(T &&src, return_value_policy policy, handle parent) {
dict d;
Expand Down Expand Up @@ -166,13 +280,35 @@ struct list_caster {
using value_conv = make_caster<Value>;

bool load(handle src, bool convert) {
if (!isinstance<sequence>(src) || isinstance<bytes>(src) || isinstance<str>(src)) {
if (!PyObjectTypeIsConvertibleToStdVector(src.ptr())) {
return false;
}
if (isinstance<sequence>(src)) {
return convert_elements(src, convert);
}
if (!convert) {
return false;
}
auto s = reinterpret_borrow<sequence>(src);
// Designed to be behavior-equivalent to passing tuple(src) from Python:
// The conversion to a tuple will first exhaust the generator object, to ensure that
// the generator is not left in an unpredictable (to the caller) partially-consumed
// state.
assert(isinstance<iterable>(src));
return convert_elements(tuple(reinterpret_borrow<iterable>(src)), convert);
}

private:
template <typename T = Type, enable_if_t<has_reserve_method<T>::value, int> = 0>
void reserve_maybe(const sequence &s, Type *) {
value.reserve(s.size());
}
void reserve_maybe(const sequence &, void *) {}

bool convert_elements(handle seq, bool convert) {
auto s = reinterpret_borrow<sequence>(seq);
value.clear();
reserve_maybe(s, &value);
for (const auto &it : s) {
for (const auto &it : seq) {
value_conv conv;
if (!conv.load(it, convert)) {
return false;
Expand All @@ -182,13 +318,6 @@ struct list_caster {
return true;
}

private:
template <typename T = Type, enable_if_t<has_reserve_method<T>::value, int> = 0>
void reserve_maybe(const sequence &s, Type *) {
value.reserve(s.size());
}
void reserve_maybe(const sequence &, void *) {}

public:
template <typename T>
static handle cast(T &&src, return_value_policy policy, handle parent) {
Expand Down Expand Up @@ -237,12 +366,8 @@ struct array_caster {
return size == Size;
}

public:
bool load(handle src, bool convert) {
if (!isinstance<sequence>(src)) {
return false;
}
auto l = reinterpret_borrow<sequence>(src);
bool convert_elements(handle seq, bool convert) {
auto l = reinterpret_borrow<sequence>(seq);
if (!require_size(l.size())) {
return false;
}
Expand All @@ -257,6 +382,25 @@ struct array_caster {
return true;
}

public:
bool load(handle src, bool convert) {
if (!PyObjectTypeIsConvertibleToStdVector(src.ptr())) {
return false;
}
if (isinstance<sequence>(src)) {
return convert_elements(src, convert);
}
if (!convert) {
return false;
}
// Designed to be behavior-equivalent to passing tuple(src) from Python:
// The conversion to a tuple will first exhaust the generator object, to ensure that
// the generator is not left in an unpredictable (to the caller) partially-consumed
// state.
assert(isinstance<iterable>(src));
return convert_elements(tuple(reinterpret_borrow<iterable>(src)), convert);
}

template <typename T>
static handle cast(T &&src, return_value_policy policy, handle parent) {
list l(src.size());
Expand Down
34 changes: 34 additions & 0 deletions tests/test_stl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,14 @@ struct type_caster<ReferenceSensitiveOptional<T>>
} // namespace detail
} // namespace PYBIND11_NAMESPACE

int pass_std_vector_int(const std::vector<int> &v) {
int zum = 100;
for (const int i : v) {
zum += 2 * i;
}
return zum;
}

TEST_SUBMODULE(stl, m) {
// test_vector
m.def("cast_vector", []() { return std::vector<int>{1}; });
Expand Down Expand Up @@ -546,4 +554,30 @@ TEST_SUBMODULE(stl, m) {
[]() { return new std::vector<bool>(4513); },
// Without explicitly specifying `take_ownership`, this function leaks.
py::return_value_policy::take_ownership);

m.def("pass_std_vector_int", pass_std_vector_int);
m.def("pass_std_vector_pair_int", [](const std::vector<std::pair<int, int>> &v) {
int zum = 0;
for (const auto &ij : v) {
zum += ij.first * 100 + ij.second;
}
return zum;
});
m.def("pass_std_array_int_2", [](const std::array<int, 2> &a) {
return pass_std_vector_int(std::vector<int>(a.begin(), a.end())) + 1;
});
m.def("pass_std_set_int", [](const std::set<int> &s) {
int zum = 200;
for (const int i : s) {
zum += 3 * i;
}
return zum;
});
m.def("pass_std_map_int", [](const std::map<int, int> &m) {
int zum = 500;
for (const auto &p : m) {
zum += p.first * 1000 + p.second;
}
return zum;
});
}
Loading