Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-36144: Add PEP 584 operators to collections.ChainMap #18832

Merged
merged 15 commits into from
Mar 23, 2020

Conversation

curtisbucher
Copy link
Contributor

@curtisbucher curtisbucher commented Mar 7, 2020

Update collections.ChainMap to include | and |= operators.

https://bugs.python.org/issue36144

Created __ior__, __or__ and __ror__ methods in ChainMap class.
Added testing for | and |= operators for ChainMap objects.
Renamed test_union operators, fixed errors and style problems raised by brandtbucher.
Added testing for union operator between ChainMap and iterable of key-value pairs.
Gave more descriptive variable names and eliminated unnecessary tmp variable.
@brandtbucher
Copy link
Member

brandtbucher commented Mar 7, 2020

Thanks for your time @curtisbucher, and welcome to CPython! 😉

Interesting, I didn't know old branch review comments would be added to the PR!

curtisbucher and others added 2 commits March 7, 2020 11:48
Co-Authored-By: Brandt Bucher <brandtbucher@gmail.com>
@brandtbucher brandtbucher requested a review from gvanrossum March 7, 2020 20:17
Copy link
Member

@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@brandtbucher brandtbucher added the type-feature A feature request or enhancement label Mar 7, 2020
@gvanrossum
Copy link
Member

Before I finish reviewing this, I have to bring up a tricky issue. There's a subtle behavior (I can't even find it in the reference manual :-( ) for binary operators.

Consider a class B that implements for example __add__ and __radd__ (the same applies to all binary operators). Now consider a subclass C that overrides __add__ and __radd__. Let b = B() and c = C(), and evaluate b + c. Surprise: this calls C.__radd__ rather than B.__add__ (first)!

The reason for this exception is that if we didn't have this special behavior (i.e., if this would follow the normal behavior of trying B.__add__ first), it would be difficult for the subclass C to override the behavior of + completely. A reasonable implementation of B.__add__ would look like this:

class B:
    def __add__(self, other):
        if not isinstance(other, B):
            return NotImplemented
        # return an instance of B representing self+other

This would totally succeed for b + c, and return an instance of B. (And the reason for that is also subtle -- since Python allows subclass constructors to have a signature that differs from the superclass constructor, there's no safe way for B to create an instance of self.__class__. We went over this extensively earlier in the discussion about dict | dict. :-)

Anyway, because we have this exception, I have constructed an example that behaves unintuitively (I think). Consider:

class C(ChainMap):
    def __ror__(self, other):
        return super().__ror__(other)

c1 = ChainMap()
c2 = C()

print(c1 | c2)  # C(ChainMap({}))    # !?
print(c2 | c1)  # C({})

However, if we don't override the __ror__ method, the first result printed changes to

print(c1 | c2)  # ChainMap({})

The output without overridden __ror__ is easily understood: both calls invoke ChainMap.__or__, and it returns an instance of the class of self. But when we override __ror__ in C (i.e. the way shown in the example) we find that C.__ror__ is called, which then calls ChainMap.__ror__. And that has an (IMO) unexpected behavior, resulting in a subclass of ChainMap whose maps[0] is itself a ChainMap.

I'm not sure, but I think the only way out may be to define __ror__ differently, in a way that's more symmetric with __or__.

Thoughts?

@brandtbucher
Copy link
Member

brandtbucher commented Mar 8, 2020

The lack of symmetry bothers me a tiny bit too (see Mapping vs MutableMapping for the type of other). To appease your fun subclass, we'd need to avoid calling other.copy() in __ror__... which is tricky.

However, we would avoid these issues altogether with something like:

     def __ior__(self, other):
         self.maps[0].update(other)
         return self

     def __or__(self, other):
         if isinstance(other, _collections_abc.Mapping):
             return self.__class__(other.copy(), *self.maps)
         return NotImplemented

     def __ror__(self, other):
         if isinstance(other, _collections_abc.Mapping):
             return self.__class__(self.maps[0].copy(), *self.maps[1:], other)
         return NotImplemented

Much easier to explain and reason about... the only issue is that we lose (part) of the symmetry between x |= y and x = x | y, as you mentioned in the BPO issue:

I'm not so keen on that, because its special behavior can't be mimicked by |=.

But I think I'm okay with that. The only alternative I see would be to insert a copy of other to the front of the chain when |= is called, which I don't like (since it changes the "primary" mapping).

You may still end up with nested ChainMaps, but at least it's clear why it's happening and it happens consistently. It's also nice that in all cases of binary |, the left operand is "copied" and the right operand is referenced (and you always end up with the same number of maps)!

It feels more true to the spirit of ChainMap, really.

@gvanrossum
Copy link
Member

The problem with how you write that is that it calls self.__class__(...) -- and who knows what the signature of the subclass __init__() or __new__() is. But maybe we could emulate this with copy()? We know that the subclass must have a public maps attribute, so we could do this:

    def __or__(self, other):
        if isinstance(other, _collections_abc.Mapping):
            new = self.copy()
            # This makes a copy of self.dict[0], which we undo like this:
            new.maps[:1] = [other.copy(), self.maps[0]]
            return new
        return NotImplemented

    def __ror__(self, other):
        if isinstance(other, _collections_abc.Mapping):
            new = self.copy()
            new.maps.append(other)
            return new
        return NotImplemented

TBH the isinstance() checks bother me -- there are no such checks elsewhere in ChainMap (which is in generally pleasantly straightforward, with no method bodies more than four lines).

I think we need to let this marinate a bit more before we commit to a given semantics. (Sorry @curtisbucher!)

@gvanrossum
Copy link
Member

Oh wait. The existing ChainMap code already calls self.__class__() in a few places (everywhere it needs a new instance: copy(), new_child(), parents. So strike that objection. (I am still a bit uncomfortable with the isinstance() checks, and also with making a |= b different from a = a | b in terms of how many maps you end up with.)

@brandtbucher
Copy link
Member

brandtbucher commented Mar 8, 2020

    def __ior__(self, other):
        self.maps[0] |= other
        return self

    def __or__(self, other):
        return self.__class__(self.maps[0] | other, *self.maps[1:])

    def __ror__(self, other):
        m = other | self.maps[-1]
        if len(self.maps) == 1:
            return self.__class__(m)
        return self.__class__(self.maps[0].copy(), *self.maps[1:-1], m)

No isinstance, same number of maps each time, no more than four lines, a |= b equivalent to a = a | b (but in-place). 😉

@curtisbucher
Copy link
Contributor Author

@brandtbucher does this solution solve the initial issue with __ror__ that @gvanrossum brought up? If so can you explain how to me?

@brandtbucher
Copy link
Member

@curtisbucher The main idea here is that we pretty thinly wrap the | and |= operators for one of our underlying mappings (either the first or last, depending on order).

We get around the reflected super() hiccup by letting other take care of its own copy and update (in its own __or__ method). So now, the behavior is consistent:

>>> from collections import ChainMap
>>> class C(ChainMap):
...     def __ror__(self, other):
...         return super().__ror__(other)
... 
>>> c1 = ChainMap()
>>> c2 = C()
>>> c1 | c2
C(ChainMap({}))
>>> c2 | c1
C(ChainMap({}))

And if you don't override __ror__ in the subclass, the results compose just as you might expect them to:

>>> from collections import ChainMap
>>> class C(ChainMap):
...     pass
... 
>>> c1 = ChainMap
>>> c2 = C()
>>> c1 | c2
ChainMap(C({}))
>>> c2 | c1
C(ChainMap({}))

I think the above behavior for this sort-of-pathological case is reasonable, and not very surprising.

@gvanrossum
Copy link
Member

Ah, this just keeps snaking out of control. :-(

I'm not particularly happy that with this solution, ChainMap() | ChainMap() produces a ChainMap whose maps references another ChainMap.

I also accept that if we want ChainMap.__or__ to ever defer to a __ror__ method on other, we'll have to use isinstance().

I keep thinking that c1 | x should just be c1.add_child(x), but then the question is what to do about x | c1 and c1 |= x. And also repeated use of this will produce very long chains.

I wonder if the original solution can be tweaked to make my corner case work, rather than going with a completely different approach?

@brandtbucher
Copy link
Member

brandtbucher commented Mar 9, 2020

I wonder if the original solution can be tweaked to make my corner case work, rather than going with a completely different approach?

If we always want a dict as our first mapping, replacing other.copy() with dict(other) in __ror__ should be sufficient:

>>> from collections import ChainMap
>>> class Subclass(ChainMap):
...     pass
... 
>>> class SubclassRor(ChainMap):
...     def __ror__(self, other):
...         return super().__ror__(other)
... 
>>> ChainMap() | ChainMap()
ChainMap({})
>>> ChainMap() | Subclass()
ChainMap({})
>>> Subclass() | ChainMap()
Subclass({})
>>> ChainMap() | SubclassRor()
SubclassRor({})

I think forcing a dict here is okay, since we're collapsing all of the maps anyways. And then __ror__ can accept Mapping, rather than just MutableMapping.

@rhettinger rhettinger self-assigned this Mar 10, 2020
@gvanrossum
Copy link
Member

OK, I think that this is probably the best we can do if we don't want the number of maps to increase, which is dangerous.

Another way to look at this: The ideas based on making | somehow equivalent to a call to add_child are trying to make a special operation (add_child) look regular (|), whereas the priority should be to make regular operations work as expected. In particular, we want to allow decent semantics for ChainMap when passed in to code expecting a Mapping (or a duck type for dict).

The latest proposal fulfills this desire, so let's go for that. (And I won't change my mind again. :-)

for child in self.maps[::-1]:
m.update(child)
return self.__class__(m)
return NotImplemented

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an extra blank line here.

with self.assertRaises(TypeError):
cm3 | pairs
cm3 |= pairs
self.assertEqual(cm3.maps, [cm3.maps[0] | dict(pairs), *cm3.maps[1:]])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an extra blank line.

@@ -0,0 +1 @@
Added :pep:`584` operators to :class:`collections.ChainMap`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add "(| and |=)" after "operators", since not everybody knows what PEP 584 is. :-)

def __ror__(self, other):
if isinstance(other, _collections_abc.MutableMapping):
m = other.copy()
for child in self.maps[::-1]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use reversed(), like in __iter__() above.

@@ -961,6 +961,24 @@ def clear(self):
'Clear maps[0], leaving maps[1:] intact.'
self.maps[0].clear()

def __ior__(self, other):
self.maps[0].update(other)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this ought to be self.maps[0] |= other -- most other ChainMap methods also defer to the corresponding method on self.maps[0].

Comment on lines 976 to 977
if isinstance(other, _collections_abc.MutableMapping):
m = other.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use Mapping instead of MutableMapping and then dict(other) instead of other.copy(), per the end of the main comment thread.

@rhettinger
Copy link
Contributor

Are _abc_collections.Mapping and _abc_collections.MutableMapping going to be updated for PEP 584? Ideally, ChainMap() would just inherit those behaviors, much as it currently does with the update() method.

If not, this PR should at least call the existing update() method rather than touching self.maps[0] directly. Not only would this simplify implementation and maintenance, it would avoid breaking the DeepChainMap(ChainMap) example in the docs which shows how to alter the target component mappings simply by overriding __setitem__ and __delitem__.

@brandtbucher
Copy link
Member

Are _abc_collections.Mapping and _abc_collections.MutableMapping going to be updated for PEP 584?

We decided not to do that for a few reasons. I don't have the exact posts handy right now, but I recall the points that shot the idea down were:

  • Mapping/MutableMapping have no copy equivalent for creating new instances (necessary for binary operator).
  • Even if they did, that adding the operators (or even just the in-place one) would create compatibility issues for virtual subclasses.

If not, this PR should at least call the existing update() method rather than touching self.maps[0] directly.

So, just to clarify, you'd prefer we spell it more like this?

    def __ior__(self, other):
        self.update(other)
        return self

    def __or__(self, other):
        if isinstance(other, _collections_abc.Mapping):
            m = self.copy()
            m.update(other)
            return m
        return NotImplemented

    def __ror__(self, other):
        if isinstance(other, _collections_abc.Mapping):
            m = dict(other)
            m.update(self)
            return self.__class__(m)
        return NotImplemented

I think that sounds reasonable.

@gvanrossum
Copy link
Member

My only concern with that is that x.update(other) iterates over the key/value pairs in other and inserts them into x one at a time. This is slow for two reasons. FIrst, if x is a ChainMap, it makes lots of calls to .__setitem__(). Second, if other is a ChainMap, its .__iter__() constructs a fresh dict containing all key/value pairs, making the exact same set of .update() calls that the current code is making, thus doubling the memory requirement.

I don't know whether ChainMap was designed for performance -- maybe the abstractions are more important than speed? However, ISTM that ChainMap is frequently cited as a good solution for various problems that would otherwise require explicitly merging dicts, in contexts where ChainMap's API beyond MutableMapping doesn't seem important (i.e. seemingly because it's assumed to be faster).

I also note that the DeepChainMap example is already missing quite a few things, e.g. .popitem(), .pop(), .clear() and .copy().

PS. @brandtbucher: Maybe it's worth explaining the two reasons for not adding these methods to the Mapping ABCs in the PEP, e.g. under Rejected Ideas?

@brandtbucher
Copy link
Member

Maybe it's worth explaining the two reasons for not adding these methods to the Mapping ABCs in the PEP, e.g. under Rejected Ideas?

I agree, since it's come up at least twice now since acceptance. I'll have something up soon.

@rhettinger
Copy link
Contributor

I'll have more time to think this through sometime in the next days (presuming there is no rush to apply 584 to ChainMap). In particular, I want to think about use cases and user's mental models of what ChainMap does. Also, I want to reread the PEP now that it has been approved.

FWIW, I don't recall seeing anyone ever use update() on a ChainMap. That wouldn't be the normal way to use it. If a user still has a variable reference to maps[0], presumably they would update that mapping directly rather than through the chainmap instance.

Aspirationally, it would be great if the new methods were implemented in terms of the existing methods and that we make the fewest possible assumptions about the API of the underlying mappings.

One other thought: the intended way to connect to another mapping is the new_child() method.

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but I would also be okay if we implemented Brandt's version in his reply to Raymond.

Let's not change anything until we hear more from @rhettinger. (Raymond: it would be nice if this made the next alpha, which is scheduled for March 23.)

@gvanrossum
Copy link
Member

@curtisbucher Did you know you can run most tests locally? E.g. to run test_collections, type this in you shell (Mac, Windows or Linux), assuming you've built python.exe locally:

./python.exe -m test test_collections

You can also build the docs locally. I'll add tips to my informal guide to contributing.

@gvanrossum gvanrossum merged commit f393b2c into python:master Mar 23, 2020
@bedevere-bot
Copy link

@gvanrossum: Please replace # with GH- in the commit message next time. Thanks!

@brandtbucher
Copy link
Member

Congrats on your first CPython contribution @curtisbucher! 🍾

Looking forward to seeing more from you in the future.

@curtisbucher curtisbucher deleted the chainmap584 branch March 23, 2020 19:17
@gvanrossum
Copy link
Member

Well, we missed 3.9.0a5. Oh well.

@gvanrossum
Copy link
Member

Calling out @rhettinger -- if you approve of the version in #18832 (comment) please say so and well make it happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants