gh-98766: Lazier ChainMap methods #98767

zrothberg · 2022-10-27T12:56:38Z

This was the best I was able to get performance without making any new methods on dict to prevent multiple hashes being calculated. I can take a stab at adding a new function to dict if there is desire to get back to the original performance of set unions. I imagine chainmap is one of the most heavily used types in the Collections module and iter is probably one of the most heavily called functions from it due to ABC.Mapping using it internally. The changes should be safe to backport to anywhere #23569 was merged as it makes no outward behavioral changes.

I can take a stab at making a better function in either c that can be imported or as a method on the dict class later. I have been working with a lot of custom dictionary like objects late and have run into this ordering breaking issue a bunch of times so I imagine some kind of standard library solution to it would be helpful for anyone making something along these lines.

Issue: Speedup ChainMap #98766

bedevere-bot · 2022-10-27T12:56:43Z

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

ghost · 2022-10-27T12:57:04Z

All commit authors signed the Contributor License Agreement.

Lib/collections/__init__.py

zrothberg · 2022-10-27T14:07:16Z

@pochmann Yea I don't think I can make it lazier without some kind of unique hash method being added to itertools. You have to construct a set to track all the keys seen minimally and adding them lazily will be much slower then constructing a dict from an iterable just from the jumps back and forth. I tried like 8 different ways and this was the fastests outside of the original set union that doesn't preserve ordering.

bedevere-bot · 2022-10-27T14:13:32Z

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

rhettinger

I don't think the PR can be salvaged. The ONLY way for dict.fromkeys() to reuse the hash values is to have inputs that are exact sets or dicts and only the latter will preserve order.

Also, this doesn't make "lazier chainmap methods". It simply creates an iterator and then consumes it immediately. It doesn't confer some extra magic that isn't already done by feeding in a dictionary directly. Instead, it does the opposite and loses the magic of reusing the stored hash values and taking advantage of the dict.update() fast paths.

rhettinger · 2022-10-27T17:32:28Z

Lib/collections/__init__.py

-        d = {}
-        for mapping in reversed(self.maps):
-            d.update(dict.fromkeys(mapping))    # reuses stored hash values if possible
+        lazy_views = (key_view.keys() for key_view in reversed(self.maps))


The variables here are misnamed and don't reflect an understanding of what is being done. The contents of self.maps are mappings, not views. The views aren't created until the keys() method is called. The views themselves aren't "lazy", they are a pass-through to the underlying dict.

This line should read:

views = (mapping.keys() for mapping in reversed(self.maps))

Also, note this is an API change. We've tried to make minimal assumptions about the underlying mappings which may not be real dicts. In particular, we haven't demanded that they support having views.

Thank you for the feedback. I am going to split this into two different PRs one for iter and one for the copy methods.

For iter I went through and profiled multiple dict and dict subclass objects and found a method that performs the same number of hashes in total as the current fromkeys method while also reducing the number of intermediate dicts like the above does. It appears to maintain the current behavior and reduces down to the original basecase for a list of dictionaries (d.update). Do you want me to open a new PR for that right away or to just link the repo?

There is some strange (to me) inconsistencies between update and fromkeys when it comes to hashreuse. I am unsure if that is intentional or not.

I can make a gist for when set, and dict and their subclasses reuse hash or not.

rhettinger · 2022-10-27T17:39:34Z

Lib/collections/__init__.py

-        for mapping in reversed(self.maps):
-            d.update(dict.fromkeys(mapping))    # reuses stored hash values if possible
+        lazy_views = (key_view.keys() for key_view in reversed(self.maps))
+        d = dict.fromkeys(_chain.from_iterable(lazy_views))


This doesn't do what you think it does. The chain extracts the keys from the views and then passes those through to dict.fromkeys(). That defeats the goal of retaining the precomputed hash values stored in the underlying mapping.

The dict.fromkeys() class method can use the precomputed hash values ONLY if the input is an exact dict or exact set. An iterable of keys will not suffice.

Also, there isn't any "laziness" at all in these two lines. You make a generator expression and then consume it immediately. It is not much different from converting sum([10, 20, 30]) to sum(iter([10, 20, 30]))` and calling it "lazy".

bedevere-bot · 2022-10-27T17:45:37Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

rhettinger · 2022-10-27T17:53:22Z

Lib/collections/__init__.py

@@ -1032,7 +1032,7 @@ def fromkeys(cls, iterable, *args):

    def copy(self):
        'New ChainMap or subclass with a new copy of maps[0] and refs to maps[1:]'
-        return self.__class__(self.maps[0].copy(), *self.maps[1:])
+        return self.__class__(self.maps[0].copy(), *_islice(self.maps, 1, None))


This change doesn't do anything useful. The time saved by not making a list is offset by the time to create the islice object.

Also, this code is stylistically bad. The old code is plain and beautiful. In contrast, the proposed change is arcane and is well outside normal code practices. It is rough on the eyes as well.

One other note. The *args form is immediately converted to a tuple by the interpreter. So, you aren't getting any "laziness" from this change. To me, it seems mostly pointless.

rhettinger · 2022-10-27T17:53:40Z

Lib/collections/__init__.py

@@ -1050,7 +1050,7 @@ def new_child(self, m=None, **kwargs):      # like Django's Context.push()
    @property
    def parents(self):                          # like Django's Context.pop()
        'New ChainMap from maps[1:].'
-        return self.__class__(*self.maps[1:])
+        return self.__class__(*_islice(self.maps, 1, None))


This change doesn't do anything useful. The time saved by not making a list is offset by the time to create the islice object.

Also, this code is stylistically bad. The old code is plain and beautiful. In contrast, the proposed change is arcane and is well outside normal code practices. It is rough on the eyes as well.

Yea I think there is a better way to handle this anyway. I probably should have left it off the PR as it wasn't as heavily tested as the iter behavior.

zrothberg added 2 commits October 27, 2022 06:42

Lazy slicing and iter

71b6de5

correct key_view spelling to match outer

eabdf02

zrothberg requested a review from rhettinger as a code owner October 27, 2022 12:56

bedevere-bot added the awaiting review label Oct 27, 2022

AlexWaygood added type-feature A feature request or enhancement performance Performance or resource usage labels Oct 27, 2022

AlexWaygood linked an issue Oct 27, 2022 that may be closed by this pull request

Speedup ChainMap #98766

Closed

rhettinger self-assigned this Oct 27, 2022

pochmann reviewed Oct 27, 2022

View reviewed changes

Lib/collections/__init__.py Outdated Show resolved Hide resolved

Fix typos

14b21cd

rhettinger requested changes Oct 27, 2022

View reviewed changes

bedevere-bot added awaiting changes and removed awaiting review labels Oct 27, 2022

rhettinger reviewed Oct 27, 2022

View reviewed changes

rhettinger closed this Nov 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-98766: Lazier ChainMap methods #98767

gh-98766: Lazier ChainMap methods #98767

Uh oh!

zrothberg commented Oct 27, 2022 •

edited by bedevere-bot

Loading

Uh oh!

bedevere-bot commented Oct 27, 2022

Uh oh!

ghost commented Oct 27, 2022 •

edited by ghost

Loading

Uh oh!

Uh oh!

zrothberg commented Oct 27, 2022

Uh oh!

bedevere-bot commented Oct 27, 2022

Uh oh!

rhettinger left a comment

Uh oh!

rhettinger Oct 27, 2022

Uh oh!

zrothberg Oct 30, 2022

Uh oh!

rhettinger Oct 27, 2022

Uh oh!

bedevere-bot commented Oct 27, 2022

Uh oh!

rhettinger Oct 27, 2022 •

edited

Loading

Uh oh!

rhettinger Oct 27, 2022 •

edited

Loading

Uh oh!

zrothberg Oct 30, 2022

Uh oh!

Uh oh!

Uh oh!

gh-98766: Lazier ChainMap methods #98767

gh-98766: Lazier ChainMap methods #98767

Uh oh!

Conversation

zrothberg commented Oct 27, 2022 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bedevere-bot commented Oct 27, 2022

Uh oh!

ghost commented Oct 27, 2022 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

zrothberg commented Oct 27, 2022

Uh oh!

bedevere-bot commented Oct 27, 2022

Uh oh!

rhettinger left a comment

Choose a reason for hiding this comment

Uh oh!

rhettinger Oct 27, 2022

Choose a reason for hiding this comment

Uh oh!

zrothberg Oct 30, 2022

Choose a reason for hiding this comment

Uh oh!

rhettinger Oct 27, 2022

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented Oct 27, 2022

Uh oh!

rhettinger Oct 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rhettinger Oct 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zrothberg Oct 30, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zrothberg commented Oct 27, 2022 •

edited by bedevere-bot

Loading

ghost commented Oct 27, 2022 •

edited by ghost

Loading

rhettinger Oct 27, 2022 •

edited

Loading

rhettinger Oct 27, 2022 •

edited

Loading