Python Proposal: Accumulation Displays

(Status: this proposal will not go forward, but helped spark a discussion on a better idea--generator expressions.)

List displays (aka list comprehensions) have proven to be a useful and popular part of Python. However, there are limitations on what they can do, and they sometimes lead to inefficiencies: it is tempting to create a list of intermediate results, even when the whole list is not really needed. Consider finding the maximum termperature over the last day, where you have a function, temp, that takes an hour as argument and fetches the temperature. With list comprehensions this is:

max([temp(hour) for hour in range(24)])

[Max: temp(hour) for hour in range(24)]

[Sum: x*x for x in numbers] [Product: Prob_spam(word) for word in email_msg] [Min: temp(hour) for hour in range(24)] [Mean: f(x) for x in data] [Median: f(x) for x in data] [Mode: f(x) for x in data] [SortBy: abs(x) for x in (-2, -4, 3, 1)]

Syntax

"[" expression ":" listmaker "]"

Semantics

[Sum: x*x for x in numbers]

total = 0 for x in it: total = (x*x + total) return total

[Mean: f(x) for x in data]

(total, n) = (0, 0) for x in it: (total, n) = (total+x, n+1) return total / n

Sum

Mean

add

result

[acc: exp for x in it]

accumulation(acc, lambda x: exp, it)

def accumulation(acc, f, it): a = acc() for x in it: a.add(f(x)) return a.result()

class Mean: def __init__(self): self.total, self.n = 0, 0 def add(self, exp): self.total, self.n = self.total+exp, self.n+1 def result(self): return self.total / self.n

Variable/Value Accumulations

add

[Argmax: votes(c) for c in candidates]

def accumulation(acc, f, it): a = acc() for x in it: a.add(f(x) , x ) return a.result()

Short-Circuit Accumulations

[Some: temp(hour) > 30 for hour in range(24)]

accumulation

def accumulation(acc, f, it): a = acc() for x in it: if a.add(f(x), x) : break return a.result()

Some

class Some: def __init__(self): self.value = False def add(self, exp, _): if exp: self.value = True return StopIteration def result(self): return self.value

StopIteration

True

add

Some

None

Parameterized Accumulations

[Top(10): humor(joke) for joke in jokes]

accumulation

Top

accumulation

def accumulation(acc, f, it): if isinstance(acc, (types.ClassType, types.TypeType)): a = acc() else: a = acc for x in it: if a.add(f(x), x): break return a.result()

Top

from heapq import * class Top: def __init__(self, k): self.k = k self.heap = [] def add(self, exp, _): heap = self.heap if len(heap) < self.k: heappush(heap, exp) elif exp < heap.biggest(): heapreplace(heap, exp) def result(self): return self.heap

jokes.sort(lambda a, b: humor(a) < humor(b)) jokes[0:10]

Non-Display Accumulations

a = Top(10) for j in new_jokes: print j a.add(float(raw_input("Rating?")), j) return [a: humor(j) for j in machine_ratable_jokes]

print

a

Discussion

map

filter

reduce

Another way to look at it is that there are some common algorithms in programming, such as finding the best element(s) or value(s) from a candidate set, that appear again and again, and this allows us to abstract out these algorithms, rather than re-implement them every time.

One drawback of my proposal is that the [ ... ] syntax suggests that a list is returned, when that need not be the case. My defense is that it is iterating over a list (or at least, an iterable). But Guido pronounced, quite reasonably, that this was not a sufficient defense.

There have been past discussions of set displays and dict displays, using syntax something like:

words = { w.lower() for w in text.split() } squares = { i:i*i for i in range(N) }

words = [Set: w.lower() for w in text.split()] squares = [Dict: i, i*i for i in range(N)]

words = set([w.lower() for w in text.split()]) squares = dict([i, i*i for i in range(N)])

exp for x in it

def gen(): for x in it: yield exp return gen()

Accumulation Display Generator Expression plus Function [Sum: x*x for x in numbers] [Product: Prob_spam(word) for word in email_msg] [Min: temp(hour) for hour in range(24)] [Median: f(x) for x in data] [SortBy: abs(x) for x in (-2, -4, 3, 1)] [Top(10): humor(joke) for joke in jokes] [Argmax: votes(c) for c in candidates] sum(x*x for x in numbers) product(Prob_spam(word) for word in email_msg) min(temp(hour) for hour in range(24)) median(f(x) for x in data) sortdecorated((abs(x),x) for x in (-2, -4, 3, 1)) top(10, (humor(joke) for joke in jokes)) argmax((c,votes(c)) for c in candidates)

With the accumulation display approach, you need a change to the language, and a new set of accumulators. With the generator expression approach you need a change to the language of similar complexity, and a combination of old funcxtions (sum, min) and new functions (product, top, argmax). The new functions are somewhat easier to write than the accumulators; for example a 6-line some function instead of an 8-line Some accumulator:

def some(items): "If some element in items is true, return it; otherwise return False" for item in items: if item: return item return False

Philip J. Eby and Raymond Hettinger proposed the generator expression idea. I'm happy that my proposal came at the right time to help spur a discussion that appears to be on the way to a happy ending--I think the generator expression idea is more generally useful than my proposal.

Other points were raised in the discussion on python-dev. Ian Bicking made a suggestion that I interpreted to mean that accumulation should be implemented as something like:

def accumulation(acc, f, it): a = acc.__accum__() for x in it: if a.add(f(x), x): break return a.result()

min, max, sum

__accum__

Next Steps

A library of accumulators. I'm thinking there should be an accum module, which users can choose to import or import * from. I have a candidate accum module. A demonstration of the use of these accumulators. I have one called testaccum which rewrites a limited class of accumulation displays into calls to accumulation and prints the results. It produces this output: temp = [70, 70, 71, 74, 76, 76, 72, 76, 77, 77, 77, 78, 78, 79, 79, 79, 78, 80, 82, 83, 83, 81, 84, 83] data = temp votes = {'Gray': 45, 'Arnie': 48, 'Peter': 3, 'Cruz': 32, 'Tom': 13} candidates = ['Gray', 'Arnie', 'Peter', 'Cruz', 'Tom'] [Max: temp[hour] for hour in range(24)] ==> 84 [Min: temp[hour] for hour in range(24)] ==> 70 [Sum: x*x for x in data] ==> 144999 [Mean: f(x) for x in data] ==> 155.25 [Median: f(x) for x in data] ==> 156.0 [Mode: f(x) for x in data] ==> 166 [Argmax: votes[c] for c in candidates] ==> Arnie [Argmin: votes[c] for c in candidates] ==> Peter [Some: temp[hour] > 75 for hour in range(24)] ==> True [Every: temp[hour] > 75 for hour in range(24)] ==> False [Top(10): temp[hour] for hour in range(24)] ==> [84, 83, 83, 83, 82, 81, 80, 79, 79, 79] [Join(', '): votes[c] for c in candidates] ==> 45, 48, 3, 32, 13 [SortBy: abs(x) for x in (-2,-4, 3, 1)] ==> [1, -2, 3, -4] [SortBy(reverse=True): abs(x) for x in (-2, -4, 3, 1)] ==> [-4, 3, -2, 1] Changing the grammar would have been rather easy. Where the grammar now has: listmaker ::= expression ( list_for | ( "," expression )* [","] ) we could change it to: listmaker ::= expression [ ":" expression ] ( list_for | ( "," expression )* [","] ) Changing the code generator would have been simple for the simple examples shown here, but would require some thought for complex listmakers with multiple for and if clauses. Basically, we want [acc: exp for x in it1 if c1 for y in it2 if c2 ...] to mean from itertools import ifilter accumulation(acc, lambda (x, y, ...): exp, icross_product(ifilter(c1, it1), ifilter(c2, it2), ...)) where icross_product takes the cross product of all the argument iterables, in left-to-right order. That is, it generates tuples in the order (c1[0], c2[0], ... cn[0]) (c1[0], c2[0], ... cn[1]) (c1[0], c2[0], ... cn[2]) ... (c1[0], c2[0], ... cn[-1]) (c1[0], c2[0], ... cn_1[1], cn[0]) (c1[0], c2[0], ... cn_1[1], cn[1]) ... (c1[-1], c2[-1], ... cn[-1]) Perhaps it should be a standard part of itertools .

This document is in the public domain and may be used for any purpose.