-
Notifications
You must be signed in to change notification settings - Fork 2.2k
pybind11::str::raw_str simplification (for Python 2) #2367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Py_XDECREF(str_value); str_value = unicode; | ||
PyObject *str_value = PyObject_Unicode(op); | ||
#else | ||
PyObject *str_value = PyObject_Str(op); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a ton of compatibility macros in common.h
. I'd define a new one to avoid this macro branching here. Something like PYBIND11_OBJECT_TO_STRING
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a ton of compatibility macros in
common.h
. I'd define a new one to avoid this macro branching here. Something likePYBIND11_OBJECT_TO_STRING
.
Thanks Boris, I'm inclined to take your suggestions, although it makes it a (slightly) sprawling change.
I just looked: the needed compatibility macro doesn't exist already, I'll have to add it.
Checking with @YannickJadoul and @EricCousineau-TRI : what's your recommendation, local change as-is or adding the compatibility macro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be fine with a compatibility macro living in common.h
. The main concern about macros is some sort of deprecation mechanism when it's no longer necessary, but I'm sure there are several ways to resolve that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe (haven't tested this explicitly with this code), you'll break the use case where a utf-8-encoded bytes object is passed. Currently, pybind11 decodes it to (Python 2's) unicode
type with "utf-8" encoding.
This PR won't do that anymore, since PyObject_Unicode
is supposedly the same as unicode(o)
(according to the docs):
>>> print(u"abcd\u0259")
abcdə
>>> unicode(u"abcd\u0259".encode("utf-8"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc9 in position 4: ordinal not in range(128)
>>> unicode(u"abcd\u0259".encode("utf-8"), "utf-8")
u'abcd\u0259'
Another thing to watch out for is custom types with __str__
/__unicode__
in Python 2. Currently, __str__
would be called, then that result converted to unicode
. This PR would immediately call __unicode__
(arguably better, I'd say, but it is a change).
@@ -930,11 +930,10 @@ class str : public object { | |||
private: | |||
/// Return string representation -- always returns a new reference, even if already a str | |||
static PyObject *raw_str(PyObject *op) { | |||
PyObject *str_value = PyObject_Str(op); | |||
if (!str_value) throw error_already_set(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one line went missing but is important! (Unless this can never fail?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Important to add a test for this as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm. Just noticed this will be done by the code generated by PYBIND11_OBJECT_CVT
;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one line went missing but is important! (Unless this can never fail?)
Oops, sorry, that was an accident (manual cherry-picking). I fixed it quick to get that part out of the way. More changes later.
This seems to be OK. It seems like >>> # This is Python 2
>>> class X:
... def __str__(self):
... return "Hello from __str__"
...
>>> str(X())
'Hello from __str__'
>>> unicode(X())
u'Hello from __str__'
>>> class Y:
... def __unicode__(self):
... return u"Hello from __unicode__"
...
>>> unicode(Y())
u'Hello from __unicode__'
>>> str(Y())
'<__main__.Y instance at 0x7f707209d1e0>' So this is better, I believe, because first The thing to watch out for is the same as above. "utf-8" is still not the default encoding: >>> # This is Python 2
>>> class Z:
... def __str__(self):
... return u"He\u0259llo from __str__".encode('utf-8')
...
>>> str(Z())
'He\xc9\x99llo from __str__'
>>> unicode(Z())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc9 in position 2: ordinal not in range(128) |
For Python 2, convert directly to unicode (instead of converting to str first, followed by encoding). Simpler code, and more obviously equivalent to `unicode(...)` in the interpreter.
5e76fc6
to
603367d
Compare
Oh wow, I didn't realize this, thanks! |
I'm ... yeah, it's always a bit arbitrary which encoding to pick? I'm not sure if it every occurs, so we might just as well try making the change (or at least proposing it?). |
|
How long is pybind11 intending to support Python 2? |
My understanding (from discussions with others) is that Python 2 will continue to be supported in |
For Python 2, convert directly to unicode (instead of converting to str first, followed by decoding).
Simpler code, and more obviously equivalent to
unicode(...)
in the interpreter.PR is #2366 is meant to be a preparation for this PR.
They could be merged in any order, but it's best to merge #2366 first, re-base and re-test this PR, then merge.