Skip to content

gh-127750: Fix singledispatchmethod caching (v2) #128648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Jan 8, 2025

Version based on idea from @dg-pb in #127839. This version

There is still a cache (stored on the object instances). Quick benchmark (windows, non-pgo):

bench singledispatchmethod: Mean +- std dev: [main] 798 ns +- 64 ns -> [prx] 495 ns +- 38 ns: 1.61x faster

Benchmark hidden because not significant (1): bench singledispatchmethod slots

Geometric mean: 1.26x faster

(note that the alternative to this PR is not to keep main, but to revert #107148)

@eendebakpt eendebakpt marked this pull request as ready for review January 8, 2025 20:50
import weakref # see comment in singledispatch function
self._method_cache = weakref.WeakKeyDictionary()
def __set_name__(self, obj, name):
self.attrname = name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check cached_property.__set_name__, it has some more stuff in it - might be needed here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. The additions there prevent something like this:

@dataclass(frozen=True)
class A:
    value: int

    @singledispatchmethod
    def dispatch(self, x):
        return id(self)

    renamed_dispatch = dispatch # allowed? if so, how should it behave

The corresponding test for the cached_property for this is

def test_reuse_different_names(self):
.

But on current main renaming is allowed for the singledispatchmethod.

I am not sure here what the desired behavior is (and why)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this implementation is desirable, maybe later someone who knows more about this can comment.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, the only reason cached properties can't be renamed is because the cache is keyed by the attribute's name.

Allowing a rebind would disconnect the cached property from it's cached value.

Copy link

@vodik vodik Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think you might want to either ignore renames or do something along these lines (ignoring error handling):

    if self.attrname:
        cache[name] = cache.pop(self.attrname)
    self.attrname = name

As far as I know, each binding shares the same instance of the descriptor, so as long as the cache key is constant, it should work no matter how many times it's been renamed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing a rebind would disconnect the cached property from it's cached value.

This is kind of the same situation.

If rename is allowed, then it would simply cache to the last attrname. Drawback is that there is a small risk for unused cached methods.

I think it might be most straight forward to copy+paste cached_property.__set_name__. It does seem a sensible restriction. It comes at expense of flexibility, but personally, I have never run into that TypeError.

Also, it will be easier to address changes/improvements when 2 implementations that use the same caching approach are aligned.

if self._method_cache is not None:
self._method_cache[obj] = _method
if cache is not None:
cache[self.attrname] = _method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not it create a reference loop? obj refers to cache, cache refers to _method, _method refers to a cell which refers to obj.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But once there are no external references to the object obj any more the garbage collector removes the objects. (the cache is on the object obj, not on the singledispatchmethod itself or the class)

In the current main the caching is done on the singledispatchmethod which keeps the generated methods alive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the current situation is worse, it creates strong references singledispatchmethod -> _method -> obj.

Relying on the garbage collection is not good. This particular loop can be broken by using a weak reference to obj instead of obj. But a reference from a bound method to the object should be strong, otherwise some code will not work (there was a similar issue with TemporaryFile).

I am not sure how much this optimization saves. Are there other ways to achieve the same speed up, without creating reference loops?

@eendebakpt
Copy link
Contributor Author

Closing in favor of #130008

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants