Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: GCC/Clang exceptions are exported without any ABI change mitigation #5359

Open
3 tasks done
feltech opened this issue Sep 6, 2024 · 0 comments
Open
3 tasks done
Labels
triage New bug, unverified

Comments

@feltech
Copy link

feltech commented Sep 6, 2024

Required prerequisites

What version (or hash if on master) of pybind11 are you using?

2.9, 2.10 (code seems similar on current master)

Problem description

pybind11 enforces hidden visibility for all symbols, apart from exceptions (including error_already_set), where the PYBIND11_EXPORT_EXCEPTION macro is used to grant them default visibility . This is presumably to allow pybind11-specific exceptions to be caught across DSO boundaries, which is useful.

However, if the DSOs use different versions of pybind11 internally, then e.g. catching error_already_set, even as a std::exception, and calling .what() on it can give garbled text (at best).

We experienced this exact scenario with a library built against pybind11 2.10 and an application built against 2.9. Between these versions, the ABI of error_already_set changed dramatically.

Setting PYBIND11_EXPORT_EXCEPTION="" works around the issue, but then pybind11-specific exceptions might fail to be caught outside the Python extension module that threw it.

Another solution could be to add an inline namespace (using the PYBIND11_VERSION_MAJOR and PYBIND11_VERSION_MINOR macros to construct a suitable name). This namespace would wrap all exported types (i.e. exceptions), or perhaps just all of pybind11 (which has the advantage that there is then technically no need to enforce hidden visibility for other symbols). E.g. the full symbolic namespace could be pybind11::v2_13 but code could still refer to it as pybind11 (because of the inline).

Reproducible example code

Tricky to put together a code example. I will explain in words what happens in our product.

You need a full example project with two DSOs. In the order they are loaded, the first uses pybind11 2.9 and the second uses 2.10. 

The first DSO calls through to the second, which calls out to a Python function. 

The Python function then `raise`s. 

The first DSO catches the resulting C++ exception as a `std::exception` and calls `.what()` on it to verify the expected exception message. It will be garbled.

You can see on stepping through a GCC 11 build using GDB, the 2.10 class is used to construct the exception, but it gets the 2.9 vtable pointer.

Is this a regression? Put the last known working version here if it is.

Not a regression

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage New bug, unverified
Projects
None yet
Development

No branches or pull requests

1 participant