Description
Discussed in #4287
Originally posted by TheShiftedBit October 26, 2022
By default, Python produces errors when converting encoding str
s with utf-8 if the str
contains surrogate characters. This can be disabled by passing surrogatepass
as a second argument to .encode()
. Pybind11 has this same behavior with its str
-> std::string
conversion. However, the bug is this: if an exception message contains a surrogate character, calling .what()
on an error_already_set
with such an exception causes another exception to be thrown, but since .what()
is noexcept
, that exception cannot be caught and the program std::terminate
s.
I'm not sure what the correct behavior regarding surrogate characters is. Perhaps pybind11 should always use surrogatepass
, perhaps not. However, even if that's not the right choice, it should probably use it during exception handling, or Python exceptions like this are extremely difficult to diagnose.