Description
Bug report
According to the documentation on object.__new__
:
If __new__() is invoked during object construction and it returns an instance of cls, then the new instance’s __init__() method will be invoked like __init__(self[, ...]), where self is the new instance and the remaining arguments are the same as were passed to the object constructor.
If __new__() does not return an instance of cls, then the new instance’s __init__() method will not be invoked.
__new__() is intended mainly to allow subclasses of immutable types (like int, str, or tuple) to customize instance creation.
And, indeed, packages exist that utilizes this feature to provide str
-compatible custom types. However, when trying to get a "string representation" of a str
-compatible instance like such with str(x)
, the following happens:
- Since
str
is a type, calling into it is essentially creating astr
instance (__new__
and__init__
).str
implemented__new__
, so it is called. - According to the documentation, the return value of
x.__str__
is checked to be an instance ofstr
(or its subclasses), which is the case here. - The return value above is treated as the return value of
str.__new__
.type.__call__
checks if this object is an instance ofstr
or its subclasses, which is, again, the case here. - According to the documentation quoted above,
__init__
is called and the object is returned.
The problem is the return value of x.__str__
can be already initialized, or even have a __init__
signature incompatible with str
. This is not an issue for str
itself since str.__init__
does nothing and have a wildcard signature (according to tests I've done), but it is trivial to have a custom (and incompatible) __init__
and break things.
Proof of concept that shows __init__
was called the second time by str()
:
class mycls:
def __init__(self, text):
print('mycls.__init__', type(self), id(self), repr(text))
self.text = text
def __str__(self):
print('mycls.__str__', type(self), id(self))
return self.text
class mystr(str):
def __new__(cls, obj):
print('mystr.__new__', cls, repr(obj))
return super().__new__(cls, obj) # Python 2: return str.__new__(cls, obj)
def __init__(self, obj):
print('mystr.__init__', type(self), id(self), repr(obj))
super().__init__() # Python2: super(str, self).__init__()
out = str(mycls(mystr('hello')))
print('out', type(out), id(out), repr(out))
Sample output on Python 3.9:
mystr.__new__ <class '__main__.mystr'> 'hello'
mystr.__init__ <class '__main__.mystr'> 2019206422080 'hello'
mycls.__init__ <class '__main__.mycls'> 2019211172304 'hello'
mycls.__str__ <class '__main__.mycls'> 2019211172304
mystr.__init__ <class '__main__.mystr'> 2019206422080 <__main__.mycls object at 0x000001D6225D59D0>
out <class '__main__.mystr'> 2019206422080 'hello'
A real-world example that breaks tomlkit
:
import tomlkit # pip install tomlkit==0.11.8
class mycls:
def __init__(self, text):
self.text = text
def __str__(self):
return self.text
value = tomlkit.value('"hello"')
print(type(value))
instance = mycls(value)
print(instance)
print(str(instance))
Sample output:
<class 'tomlkit.items.String'>
hello
Traceback (most recent call last):
File "C:\bug\poc2.py", line 16, in <module>
print(str(instance))
TypeError: __init__() missing 3 required positional arguments: '_', 'original', and 'trivia'
This behavior is introduced by commit 8ace1ab 22 years ago, released in Python 2.3 and kept to the day.
A possible solution is to check for the exact class (instead of with subclasses) in type.__call__
, however I'm not sure if this behavior is compliant with the documentation. Change str.__new__
to only allow str
(and not its subclasses) to be returned by __str__
could also workaround this issue, but may break even more stuffs.
Your environment
- CPython versions tested on: 2.7.18, 3.9.12, 3.11.3
- Operating system and architecture: Windows 10.0.19045 x64, Debian Linux 11 x64, Fedora IoT 37 x64
Linked PRs
- gh-104231: make
str(x)
a str,bytes(x)
a bytes #104247 - gh-104231: emit warning on
__bytes__
and__str__
when returning strict subclass #108814 - gh-104231: Add more tests for str(), repr(), ascii(), and bytes() #112551
- [3.12] gh-104231: Add more tests for str(), repr(), ascii(), and bytes() (GH-112551) #112555
- [3.11] gh-104231: Add more tests for str(), repr(), ascii(), and bytes() (GH-112551) #112556
- gh-104231: Make str() and repr() always returning str, and bytes() -- bytes #112583