Skip to content

Superfluous call to __init__ on str(x)/bytes(x) when __new__ returns an instance of str/bytes' subclass #104231

Open
@EZForever

Description

@EZForever

Bug report

According to the documentation on object.__new__:

If __new__() is invoked during object construction and it returns an instance of cls, then the new instance’s __init__() method will be invoked like __init__(self[, ...]), where self is the new instance and the remaining arguments are the same as were passed to the object constructor.

If __new__() does not return an instance of cls, then the new instance’s __init__() method will not be invoked.

__new__() is intended mainly to allow subclasses of immutable types (like int, str, or tuple) to customize instance creation.

And, indeed, packages exist that utilizes this feature to provide str-compatible custom types. However, when trying to get a "string representation" of a str-compatible instance like such with str(x), the following happens:

The problem is the return value of x.__str__ can be already initialized, or even have a __init__ signature incompatible with str. This is not an issue for str itself since str.__init__ does nothing and have a wildcard signature (according to tests I've done), but it is trivial to have a custom (and incompatible) __init__ and break things.

Proof of concept that shows __init__ was called the second time by str():

class mycls:
    def __init__(self, text):
        print('mycls.__init__', type(self), id(self), repr(text))
        self.text = text

    def __str__(self):
        print('mycls.__str__', type(self), id(self))
        return self.text

class mystr(str):
    def __new__(cls, obj):
        print('mystr.__new__', cls, repr(obj))
        return super().__new__(cls, obj) # Python 2: return str.__new__(cls, obj)

    def __init__(self, obj):
        print('mystr.__init__', type(self), id(self), repr(obj))
        super().__init__() # Python2: super(str, self).__init__()

out = str(mycls(mystr('hello')))
print('out', type(out), id(out), repr(out))

Sample output on Python 3.9:

mystr.__new__ <class '__main__.mystr'> 'hello'
mystr.__init__ <class '__main__.mystr'> 2019206422080 'hello'
mycls.__init__ <class '__main__.mycls'> 2019211172304 'hello'
mycls.__str__ <class '__main__.mycls'> 2019211172304
mystr.__init__ <class '__main__.mystr'> 2019206422080 <__main__.mycls object at 0x000001D6225D59D0>
out <class '__main__.mystr'> 2019206422080 'hello'

A real-world example that breaks tomlkit:

import tomlkit # pip install tomlkit==0.11.8

class mycls:
    def __init__(self, text):
        self.text = text

    def __str__(self):
        return self.text

value = tomlkit.value('"hello"')
print(type(value))

instance = mycls(value)
print(instance)
print(str(instance))

Sample output:

<class 'tomlkit.items.String'>
hello
Traceback (most recent call last):
  File "C:\bug\poc2.py", line 16, in <module>
    print(str(instance))
TypeError: __init__() missing 3 required positional arguments: '_', 'original', and 'trivia'

This behavior is introduced by commit 8ace1ab 22 years ago, released in Python 2.3 and kept to the day.

A possible solution is to check for the exact class (instead of with subclasses) in type.__call__, however I'm not sure if this behavior is compliant with the documentation. Change str.__new__ to only allow str (and not its subclasses) to be returned by __str__ could also workaround this issue, but may break even more stuffs.

Your environment

  • CPython versions tested on: 2.7.18, 3.9.12, 3.11.3
  • Operating system and architecture: Windows 10.0.19045 x64, Debian Linux 11 x64, Fedora IoT 37 x64

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)type-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions