Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions include/pybind11/detail/internals.h
Original file line number Diff line number Diff line change
Expand Up @@ -510,11 +510,19 @@ struct local_internals {
};

/// Works like `get_internals`, but for things which are locally registered.
/// There is one variable storing the local internals for each module, but every embedded module
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little unclear if I understand this correctly. I'm wondering is

but all embedded modules share the same locals.

correct? If not I'm confused, if yes that would be my suggested wording.

Copy link
Contributor Author

@StarQTius StarQTius Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry if it does not sound clear, English is not my native language. What I meant is that since the static local variable of get_local_internals does not get dynamically linked when a non-embedded module is loaded by the interpreter, you end up with one definition of the locals for the interpreter, and one definition for each non-embedded modules.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in that case my suggested wording is clearer. But I'm not a native speaker, too!
Let's see what @Skylion007 thinks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm having serious 2nd thoughts.

E.g. at Google, Python and all extension modules for a given binary (or test) are linked together statically. Each binary or test is hermetic.

Others may use RTLD_GLOBAL. Probably rare, but maybe in certain environment that's what's needed?

What does this PR mean in those situations? I'm not sure. Does someone else have a clear understanding of how the m_clear functionality plays out across all environments?

Guessing a lot: I think to be completely safe, we'd need some kind of lookup table, to keep track of which module_def pointers are associated with a given local_internals pointer, and if more than one, we cannot safely clear_local_internals until the last one disappears.

Or we could have static std::unordered_map<module_def *, local_internals> with all the required logic to register & clear types given a module_def pointer, but search for types over all keys of the map.

It seems pretty tricky to get the m_clear functionality correct in all environments.

Given those uncertainties, and that pybind11 existed to this point without using m_clear, maybe this should be an opt-in, e.g. via some #define, or runtime option? That would be a safer and less sudden way to find out what's possible and what not.

/// shares the same locals.
inline local_internals &get_local_internals() {
static local_internals locals;
return locals;
}

/// Clears the locally registered types and exception translators.
inline void clear_local_internals() {
detail::get_local_internals().registered_types_cpp.clear();
detail::get_local_internals().registered_exception_translators.clear();
}

/// Constructs a std::string with the given arguments, stores it in `internals`, and returns its
/// `c_str()`. Such strings objects have a long storage duration -- the internal strings are only
/// cleared when the program exits or after interpreter shutdown (when embedding), and so are
Expand Down
5 changes: 2 additions & 3 deletions include/pybind11/embed.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
static ::pybind11::module_::module_def PYBIND11_CONCAT(pybind11_module_def_, name); \
static void PYBIND11_CONCAT(pybind11_init_, name)(::pybind11::module_ &); \
static PyObject PYBIND11_CONCAT(*pybind11_init_wrapper_, name)() { \
auto m = ::pybind11::module_::create_extension_module( \
auto m = ::pybind11::module_::create_embedded_extension_module( \
PYBIND11_TOSTRING(name), nullptr, &PYBIND11_CONCAT(pybind11_module_def_, name)); \
try { \
PYBIND11_CONCAT(pybind11_init_, name)(m); \
Expand Down Expand Up @@ -200,8 +200,7 @@ inline void finalize_interpreter() {
}
// Local internals contains data managed by the current interpreter, so we must clear them to
// avoid undefined behaviors when initializing another interpreter
detail::get_local_internals().registered_types_cpp.clear();
detail::get_local_internals().registered_exception_translators.clear();
detail::clear_local_internals();

Py_Finalize();

Expand Down
34 changes: 33 additions & 1 deletion include/pybind11/pybind11.h
Original file line number Diff line number Diff line change
Expand Up @@ -1232,7 +1232,7 @@ class module_ : public object {
/* m_slots */ nullptr,
/* m_traverse */ nullptr,
/* m_clear */ nullptr,
/* m_free */ nullptr};
/* m_free */ [](void *) { detail::clear_local_internals(); }};
auto *m = PyModule_Create(def);
if (m == nullptr) {
if (PyErr_Occurred()) {
Expand All @@ -1245,6 +1245,38 @@ class module_ : public object {
// For Python 2, reinterpret_borrow was correct.
return reinterpret_borrow<module_>(m);
}

/** \rst
Create a new top-level embedded module.

``def`` should point to a statically allocated module_def.
\endrst */
static module_
create_embedded_extension_module(const char *name, const char *doc, module_def *def) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this a lot of code duplication for changing the default value of one argument and the generic pybind11 fail, isn't it? If you think from API/ABI design they should remain seperate, maybe they can both call a common private method. Thoughts @henryiii @rwgk ?

Copy link
Collaborator

@Skylion007 Skylion007 Mar 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this would actually be better to subclass or just add it as an optional argument.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @Skylion007, thinking it will be easy to avoid the code duplication, maybe as simple as bool is_embedded = false? (same idea as Aaron's?)

My second concern is testing: what would it take to add a test that fails without the change here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just feels like bad code smell, the MACRO we are using is already a bit hacky. Is there a way we can improve create_embedded_extension_module to automatically detect when it's being called from that macro? It seems like a rather specific and limited use case that the end user should never call directly so this method should really be hidden somewhere as an implementation detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there is a reason to allow embedded and non-embedded modules in the same executable / library, maybe detecting whether embed.h has been included, or having the user indicate if they want the modules to be embedded with a #define PYBIND11_EMBED are options to consider ?

Copy link
Contributor Author

@StarQTius StarQTius Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwgk About the testing: I added a line in external_module.cpp which was making the "Restart an interpreter" test case fail in a way that was consistent with the original issue (ie. having the library complain about an already registered type). I could have added another test but I didn't see the point since it would have done exactly the same thing and I needed to add that line in the external module in any case.
In retrospect, maybe it would have been better to dedicate another external module to this test case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, here is my proposal @StarQTius:

  • Factor out the implementation of the static module function into to a function in a pybind11 detail namespace. You can have it use the is_embedded to denote the different behavior. Have the static function call the new detail one.
  • Have the macro call the new detail function with the proper bool from the macro.

That way we don't create a new static method API that users can accidentally use or that we have to mantain and the API is the same as before with minimal code duplication. (The detail namespace will indicate this is not a new part of the API).

Copy link
Collaborator

@rwgk rwgk Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bout the testing: I added a line in external_module.cpp which was making the "Restart an interpreter" test case fail in a way that was consistent with the original issue (ie. having the library complain about an already registered type).

I see, thanks! I saw that line but didn't make the connection. Could you please add a comment, maybe one or two lines to explain in a nutshell, plus reference to this PR? So that it will be quick for anyone looking at the test code in the future, with no specific context, to make the connection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you asked, I moved the extension module creation implementation detail into a non-API function. I had to add forward declarations to make it works though. I also added comments to clarify how local internals are tested, and a reference to the PR.

// module_def is PyModuleDef
// Placement new (not an allocation).
def = new (def)
PyModuleDef{/* m_base */ PyModuleDef_HEAD_INIT,
/* m_name */ name,
/* m_doc */ options::show_user_defined_docstrings() ? doc : nullptr,
/* m_size */ -1,
/* m_methods */ nullptr,
/* m_slots */ nullptr,
/* m_traverse */ nullptr,
/* m_clear */ nullptr,
/* m_free */ nullptr};
auto *m = PyModule_Create(def);
if (m == nullptr) {
if (PyErr_Occurred()) {
throw error_already_set();
}
pybind11_fail("Internal error in module_::create_embedded_extension_module()");
}
// TODO: Should be reinterpret_steal for Python 3, but Python also steals it again when
// returned from PyInit_...
// For Python 2, reinterpret_borrow was correct.
return reinterpret_borrow<module_>(m);
}
};

// When inside a namespace (or anywhere as long as it's not the first item on a line),
Expand Down
3 changes: 3 additions & 0 deletions tests/test_embed/external_module.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,10 @@ PYBIND11_MODULE(external_module, m) {
int v;
};

class B {};

py::class_<A>(m, "A").def(py::init<int>()).def_readwrite("value", &A::v);
py::class_<B>(m, "B", py::module_local());

m.def("internals_at",
[]() { return reinterpret_cast<uintptr_t>(&py::detail::get_internals()); });
Expand Down