-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-36144: Add PEP 584 operators to collections.ChainMap #18832
Conversation
Created __ior__, __or__ and __ror__ methods in ChainMap class.
Added testing for | and |= operators for ChainMap objects.
Renamed test_union operators, fixed errors and style problems raised by brandtbucher.
Added testing for union operator between ChainMap and iterable of key-value pairs.
Gave more descriptive variable names and eliminated unnecessary tmp variable.
Thanks for your time @curtisbucher, and welcome to CPython! 😉 Interesting, I didn't know old branch review comments would be added to the PR! |
Co-Authored-By: Brandt Bucher <brandtbucher@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Before I finish reviewing this, I have to bring up a tricky issue. There's a subtle behavior (I can't even find it in the reference manual :-( ) for binary operators. Consider a class B that implements for example The reason for this exception is that if we didn't have this special behavior (i.e., if this would follow the normal behavior of trying class B:
def __add__(self, other):
if not isinstance(other, B):
return NotImplemented
# return an instance of B representing self+other This would totally succeed for Anyway, because we have this exception, I have constructed an example that behaves unintuitively (I think). Consider: class C(ChainMap):
def __ror__(self, other):
return super().__ror__(other)
c1 = ChainMap()
c2 = C()
print(c1 | c2) # C(ChainMap({})) # !?
print(c2 | c1) # C({}) However, if we don't override the print(c1 | c2) # ChainMap({}) The output without overridden I'm not sure, but I think the only way out may be to define Thoughts? |
The lack of symmetry bothers me a tiny bit too (see However, we would avoid these issues altogether with something like: def __ior__(self, other):
self.maps[0].update(other)
return self
def __or__(self, other):
if isinstance(other, _collections_abc.Mapping):
return self.__class__(other.copy(), *self.maps)
return NotImplemented
def __ror__(self, other):
if isinstance(other, _collections_abc.Mapping):
return self.__class__(self.maps[0].copy(), *self.maps[1:], other)
return NotImplemented Much easier to explain and reason about... the only issue is that we lose (part) of the symmetry between
But I think I'm okay with that. The only alternative I see would be to insert a copy of You may still end up with nested ChainMaps, but at least it's clear why it's happening and it happens consistently. It's also nice that in all cases of binary It feels more true to the spirit of |
The problem with how you write that is that it calls def __or__(self, other):
if isinstance(other, _collections_abc.Mapping):
new = self.copy()
# This makes a copy of self.dict[0], which we undo like this:
new.maps[:1] = [other.copy(), self.maps[0]]
return new
return NotImplemented
def __ror__(self, other):
if isinstance(other, _collections_abc.Mapping):
new = self.copy()
new.maps.append(other)
return new
return NotImplemented TBH the I think we need to let this marinate a bit more before we commit to a given semantics. (Sorry @curtisbucher!) |
Oh wait. The existing |
def __ior__(self, other):
self.maps[0] |= other
return self
def __or__(self, other):
return self.__class__(self.maps[0] | other, *self.maps[1:])
def __ror__(self, other):
m = other | self.maps[-1]
if len(self.maps) == 1:
return self.__class__(m)
return self.__class__(self.maps[0].copy(), *self.maps[1:-1], m) No |
@brandtbucher does this solution solve the initial issue with |
@curtisbucher The main idea here is that we pretty thinly wrap the We get around the reflected >>> from collections import ChainMap
>>> class C(ChainMap):
... def __ror__(self, other):
... return super().__ror__(other)
...
>>> c1 = ChainMap()
>>> c2 = C()
>>> c1 | c2
C(ChainMap({}))
>>> c2 | c1
C(ChainMap({})) And if you don't override >>> from collections import ChainMap
>>> class C(ChainMap):
... pass
...
>>> c1 = ChainMap
>>> c2 = C()
>>> c1 | c2
ChainMap(C({}))
>>> c2 | c1
C(ChainMap({})) I think the above behavior for this sort-of-pathological case is reasonable, and not very surprising. |
Ah, this just keeps snaking out of control. :-( I'm not particularly happy that with this solution, I also accept that if we want I keep thinking that I wonder if the original solution can be tweaked to make my corner case work, rather than going with a completely different approach? |
If we always want a >>> from collections import ChainMap
>>> class Subclass(ChainMap):
... pass
...
>>> class SubclassRor(ChainMap):
... def __ror__(self, other):
... return super().__ror__(other)
...
>>> ChainMap() | ChainMap()
ChainMap({})
>>> ChainMap() | Subclass()
ChainMap({})
>>> Subclass() | ChainMap()
Subclass({})
>>> ChainMap() | SubclassRor()
SubclassRor({}) I think forcing a |
OK, I think that this is probably the best we can do if we don't want the number of maps to increase, which is dangerous. Another way to look at this: The ideas based on making The latest proposal fulfills this desire, so let's go for that. (And I won't change my mind again. :-) |
for child in self.maps[::-1]: | ||
m.update(child) | ||
return self.__class__(m) | ||
return NotImplemented | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an extra blank line here.
with self.assertRaises(TypeError): | ||
cm3 | pairs | ||
cm3 |= pairs | ||
self.assertEqual(cm3.maps, [cm3.maps[0] | dict(pairs), *cm3.maps[1:]]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an extra blank line.
@@ -0,0 +1 @@ | |||
Added :pep:`584` operators to :class:`collections.ChainMap`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add "(|
and |=
)" after "operators", since not everybody knows what PEP 584 is. :-)
Lib/collections/__init__.py
Outdated
def __ror__(self, other): | ||
if isinstance(other, _collections_abc.MutableMapping): | ||
m = other.copy() | ||
for child in self.maps[::-1]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use reversed()
, like in __iter__()
above.
Lib/collections/__init__.py
Outdated
@@ -961,6 +961,24 @@ def clear(self): | |||
'Clear maps[0], leaving maps[1:] intact.' | |||
self.maps[0].clear() | |||
|
|||
def __ior__(self, other): | |||
self.maps[0].update(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this ought to be self.maps[0] |= other
-- most other ChainMap methods also defer to the corresponding method on self.maps[0]
.
Lib/collections/__init__.py
Outdated
if isinstance(other, _collections_abc.MutableMapping): | ||
m = other.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use Mapping
instead of MutableMapping
and then dict(other)
instead of other.copy()
, per the end of the main comment thread.
Are If not, this PR should at least call the existing update() method rather than touching |
We decided not to do that for a few reasons. I don't have the exact posts handy right now, but I recall the points that shot the idea down were:
So, just to clarify, you'd prefer we spell it more like this? def __ior__(self, other):
self.update(other)
return self
def __or__(self, other):
if isinstance(other, _collections_abc.Mapping):
m = self.copy()
m.update(other)
return m
return NotImplemented
def __ror__(self, other):
if isinstance(other, _collections_abc.Mapping):
m = dict(other)
m.update(self)
return self.__class__(m)
return NotImplemented I think that sounds reasonable. |
My only concern with that is that I don't know whether ChainMap was designed for performance -- maybe the abstractions are more important than speed? However, ISTM that ChainMap is frequently cited as a good solution for various problems that would otherwise require explicitly merging dicts, in contexts where ChainMap's API beyond MutableMapping doesn't seem important (i.e. seemingly because it's assumed to be faster). I also note that the DeepChainMap example is already missing quite a few things, e.g. PS. @brandtbucher: Maybe it's worth explaining the two reasons for not adding these methods to the Mapping ABCs in the PEP, e.g. under Rejected Ideas? |
I agree, since it's come up at least twice now since acceptance. I'll have something up soon. |
I'll have more time to think this through sometime in the next days (presuming there is no rush to apply 584 to ChainMap). In particular, I want to think about use cases and user's mental models of what ChainMap does. Also, I want to reread the PEP now that it has been approved. FWIW, I don't recall seeing anyone ever use update() on a ChainMap. That wouldn't be the normal way to use it. If a user still has a variable reference to Aspirationally, it would be great if the new methods were implemented in terms of the existing methods and that we make the fewest possible assumptions about the API of the underlying mappings. One other thought: the intended way to connect to another mapping is the new_child() method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, but I would also be okay if we implemented Brandt's version in his reply to Raymond.
Let's not change anything until we hear more from @rhettinger. (Raymond: it would be nice if this made the next alpha, which is scheduled for March 23.)
@curtisbucher Did you know you can run most tests locally? E.g. to run
You can also build the docs locally. I'll add tips to my informal guide to contributing. |
@gvanrossum: Please replace |
Congrats on your first CPython contribution @curtisbucher! 🍾 Looking forward to seeing more from you in the future. |
Well, we missed 3.9.0a5. Oh well. |
Calling out @rhettinger -- if you approve of the version in #18832 (comment) please say so and well make it happen. |
Update collections.ChainMap to include | and |= operators.
https://bugs.python.org/issue36144