New operators for Python dicts?

Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

The Python dictionary is a commonly used data structure that supports a rich set of operations. But there are some operations that it lacks—two operators in particular: " + " and " += ". That lack is the subject of a recent discussion on the python-ideas mailing list. There are questions about the precise semantics of the operators, but there is also something of an existential question about the need for operators whose semantics can already be handled using existing operations.

Some background

Dictionaries (or dicts) are also known as associative arrays or hashes in other languages. In essence, they map some key, which is usually—but not always—a string, to some other value. A simple example:

>>> a_dict = { 'a' : 3, 9 : 7, 'foo' : 'bar' } >>> a_dict['a'] 3 >>> a_dict[9] 7 >>> a_dict['bar'] Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'bar'

a_dict

KeyError

That sets upas a dict with three elements, then shows accessing various elements of the dict. The last key, "bar", is not present, so attempting to access it results in a runtimeexception.

One of the other fundamental Python types is the list, which provides an ordered sequence of objects, in many ways like arrays in other languages.

>>> b_list = [ 1, 2, 3 ] >>> b_list[2] 3

+

+=

>>> b_list + b_list [1, 2, 3, 1, 2, 3] >>> b_list += [ 4, 5, 6 ] >>> b_list [1, 2, 3, 4, 5, 6]

>>> a_dict + a_dict Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported operand type(s) for +: 'dict' and 'dict'

TypeError

+=

Adding + and +=

But lists have two operators that dicts lack. In particular, lists can be concatenated usingandDoing something similar with dicts, though, leads to an exception:is also raised for theoperator when used on dicts.

But Ian Lee would like to see that change. He first raised the issue in a brief subthread on the python-dev mailing list in a discussion about code review for PEP 448. Lee subsequently moved the topic to python-ideas, where he suggested adding both the + and += operators for dicts to put them on an equal footing with lists.

The semantics of adding two dicts that have no keys in common seems clear: the result is a dict with all of the key/value pairs from both operands. For + , the result is a new dict, while += modifies the dict on the left. The only real question is what to do when there are duplicate keys. Dicts already have an update() method that takes the value for a duplicate key from the argument dict:

>>> a_dict = { 'a' : 1, 'b' : 2 } >>> a_dict.update( { 'b' : 9 } ) >>> a_dict {'a': 1, 'b': 9}

last setter wins

update()

>>> a_dict = { 'a' : 1, 'b' : 2 } >>> b_dict = { 'b' : 'bar', 'c' : 'baz' } >>> a_dict + b_dict { 'a' : 1, 'b' : 'bar', 'c' : 'baz' } >>> b_dict += a_dict >>> b_dict { 'a' : 1, 'b' : 2, 'c' : 'baz' }

Lee suggested using the "" for the new operators, as themethod does. So the value for a duplicate key comes from the right operand:

Donald Stufft liked the idea behind the change, but didn't like using + . He would rather use " | " to try to make it clearer that it is really more of a set union operation, rather than a concatenation or addition. Ethan Furman, though, sees + as a generic operator for combining things. On the other hand: "I suppose I could come around to '|', though -- it does ease the tension around the behavior of duplicate keys", he said.

After a bit of a digression through a question of commutativity (which is not preserved by the operators, but that is hardly unique—string concatenation doesn't either, for example), Marc-André Lemburg explained that he didn't see the need for + , though += could be useful:

However, I don't really see the point in having an operation that takes two dictionaries, creates a new empty one and updates this with both sides of the operand. It may be theoretically useful, but it results in the same poor performance you have in string concatenation. In applications, you normally just need the update functionality for dictionaries. If you do need a copy, you can create a copy explicitly - but those cases are usually rare.

Having one of those operators without the other seems a bit strange to some, though. Operators in Python are implemented as special methods on objects, so a + b becomes a.__add__(b) (similarly, += uses the __iadd__() special method). Dicts could pick up an __iadd__() method (or the | equivalent: __ior__() ), but most developers, especially those new to the language, would probably expect + to work if += did.

Other options

In the case of duplicated keys, there are (at least) two other options. An exception could be raised when combining two dicts that have keys in common, as Greg Ewing suggested, though that might be surprising. Another option would be to apply the addition operator to the two values, but that might cause its own set of surprises:

>>> a_dict = { 'a' : 2, 'b' : 'foo' } >>> b_dict = { 'a' : 4, 'b' : 'foo' } >>> c_dict = { 'b' : 3 } >>> a_dict + b_dict { 'a' : 6, 'b' : 'foofoo' } >>> b_dict + c_dict ... TypeError: cannot concatenate 'str' and 'int' objects

Either the addition/concatenation or the exception might well surprise developers.

Lee summarized the ideas and approaches from early on in the thread in a kind of a pre-PEP document.

Even though there is a lot of precedent for operators like + and += , Stephen D'Aprano argued that they are actually flawed ideas that should not be further propagated. The fact that lists have those operators is not for the better:

It is *unfortunate* that += works with lists and tuples because + works, not a feature to emulate. Python made the best of a bad deal with augmented assignments: a syntax which works fine in C doesn't *quite* work cleanly in Python, but demand for it [led] to it being supported. The consequence is that every generation of Python programmers now need to learn for themselves that += on non-numeric types has surprising corner cases. Usually the hard way.

D'Aprano described one of those corner cases (which also appears in the Python FAQ) for the tuple immutable sequence type:

>>> t = ([], None) >>> t[0] += [1] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment >>> t ([1], None)

t

None

+=

setitem(t, 0, getitem(t, 0).__iadd__([1]))

setitem()

__iadd__()

>>> l = t[0] >>> l += [1] >>> t[0] = l

l

In the example,is an immutable sequence of two items: an empty list and. Because of the way Python handles theoperator, the exception isn't raised until after the "desired" change has been made to the list. As Andrew Barnert explained , Python essentially turns the statement into:It is thethat fails, but thehas already succeeded in changing the list object. Another way to look at it would be:The final assignment is where that sequence fails, but the list objecthas already been modified. That "feature" is—at best—a language wart.

The subject of dict.__add__() comes up on python-ideas with some frequency, and it is clear there are strong feelings on all of the different sides. Stufft thinks it would make a nice "mini-addition" to the language that might make newer versions a little more attractive:

new_dict = dict1.copy() new_dict.update(dict2) Isn't confusing or particularly hard to use, however being able to type that as new_dict = dict1 + dict2 is more succinct, cleaner, and just a little bit nicer. It adds another small reason why, taken with the other small reasons, someone might want to drop an older version of Python for a newer version. Similarly doing:Isn't confusing or particularly hard to use, however being able to type that as new_dict = dict1 + dict2 is more succinct, cleaner, and just a little bit nicer. It adds another small reason why, taken with the other small reasons, someone might want to drop an older version of Python for a newer version.

But Stephen J. Turnbull is not convinced that the semantics are so clear that what the operators do would be obvious to most. He noted that four different ways to handle the duplicate-key problem had been proposed and added two more, possibly with tongue in cheek. In addition, since there are existing ways to perform those operations, adding another violates the Python "there's only one way to do it" (TOOWTDI) guideline.

Early on, Lee indicated that he would try to shepherd a PEP through the process to see if the operators could be added to dicts. Brett Cannon agreed with that idea:

I think a PEP that finally settled this idea would be good, even if it just ends up being a historical document as to why dicts don't have an __add__ method. Obviously there is a good amount of support both for and against the idea.

That's where things stand now. No PEP has yet appeared, though it seems likely that one will. It is an interesting question in that both sides seem to see their choice as the "obvious" one. There is precedent in that lists have the two operators, but that precedent does lead to some corner cases and warts. Even if the PEP were to be accepted, it would only be a feature for some upcoming version of Python 3—features are no longer being added to Python 2. One suspects that in the end it will come down to what benevolent dictator for life (BDFL) Guido van Rossum thinks—so far he has been silent in the thread.