Skip to content

gh-98766: Lazier ChainMap methods #98767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions Lib/collections/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
from itertools import chain as _chain
from itertools import repeat as _repeat
from itertools import starmap as _starmap
from itertools import islice as _islice
from keyword import iskeyword as _iskeyword
from operator import eq as _eq
from operator import itemgetter as _itemgetter
Expand Down Expand Up @@ -1010,9 +1011,8 @@ def __len__(self):
return len(set().union(*self.maps)) # reuses stored hash values if possible

def __iter__(self):
d = {}
for mapping in reversed(self.maps):
d.update(dict.fromkeys(mapping)) # reuses stored hash values if possible
lazy_views = (key_view.keys() for key_view in reversed(self.maps))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variables here are misnamed and don't reflect an understanding of what is being done. The contents of self.maps are mappings, not views. The views aren't created until the keys() method is called. The views themselves aren't "lazy", they are a pass-through to the underlying dict.

This line should read:

views = (mapping.keys() for mapping in reversed(self.maps))

Also, note this is an API change. We've tried to make minimal assumptions about the underlying mappings which may not be real dicts. In particular, we haven't demanded that they support having views.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback. I am going to split this into two different PRs one for iter and one for the copy methods.

For iter I went through and profiled multiple dict and dict subclass objects and found a method that performs the same number of hashes in total as the current fromkeys method while also reducing the number of intermediate dicts like the above does. It appears to maintain the current behavior and reduces down to the original basecase for a list of dictionaries (d.update). Do you want me to open a new PR for that right away or to just link the repo?

There is some strange (to me) inconsistencies between update and fromkeys when it comes to hashreuse. I am unsure if that is intentional or not.

I can make a gist for when set, and dict and their subclasses reuse hash or not.

d = dict.fromkeys(_chain.from_iterable(lazy_views))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't do what you think it does. The chain extracts the keys from the views and then passes those through to dict.fromkeys(). That defeats the goal of retaining the precomputed hash values stored in the underlying mapping.

The dict.fromkeys() class method can use the precomputed hash values ONLY if the input is an exact dict or exact set. An iterable of keys will not suffice.

Also, there isn't any "laziness" at all in these two lines. You make a generator expression and then consume it immediately. It is not much different from converting sum([10, 20, 30]) to sum(iter([10, 20, 30]))` and calling it "lazy".

return iter(d)

def __contains__(self, key):
Expand All @@ -1032,7 +1032,7 @@ def fromkeys(cls, iterable, *args):

def copy(self):
'New ChainMap or subclass with a new copy of maps[0] and refs to maps[1:]'
return self.__class__(self.maps[0].copy(), *self.maps[1:])
return self.__class__(self.maps[0].copy(), *_islice(self.maps, 1, None))
Copy link
Contributor

@rhettinger rhettinger Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change doesn't do anything useful. The time saved by not making a list is offset by the time to create the islice object.

Also, this code is stylistically bad. The old code is plain and beautiful. In contrast, the proposed change is arcane and is well outside normal code practices. It is rough on the eyes as well.

One other note. The *args form is immediately converted to a tuple by the interpreter. So, you aren't getting any "laziness" from this change. To me, it seems mostly pointless.


__copy__ = copy

Expand All @@ -1050,7 +1050,7 @@ def new_child(self, m=None, **kwargs): # like Django's Context.push()
@property
def parents(self): # like Django's Context.pop()
'New ChainMap from maps[1:].'
return self.__class__(*self.maps[1:])
return self.__class__(*_islice(self.maps, 1, None))
Copy link
Contributor

@rhettinger rhettinger Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change doesn't do anything useful. The time saved by not making a list is offset by the time to create the islice object.

Also, this code is stylistically bad. The old code is plain and beautiful. In contrast, the proposed change is arcane and is well outside normal code practices. It is rough on the eyes as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I think there is a better way to handle this anyway. I probably should have left it off the PR as it wasn't as heavily tested as the iter behavior.


def __setitem__(self, key, value):
self.maps[0][key] = value
Expand Down