Understanding nested list comprehensions in Python

In my last blog post, I discussed list comprehensions, and how to think about them. Several people suggested (via e-mail, and in comments on the blog) that I should write a follow-up posting about nested list comprehensions.

I must admit that nested list comprehensions are something that I’ve shied away from for years. Every time I’ve tried to understand them, let alone teach them, I’ve found myself stumbling for words, without being clear about what was happening, what the syntax is, or where I would want to use them. I managed to use them on a few occasions, but only after a great deal of trial and error, and without really understanding what I was doing.

Fortunately, the requests that I received, asking how to work with such nested list comprehensions, forced me to get over my worries. I’ve figured out what’s going on, and even think that I understand what my problem was with understanding them before.

The key thing to remember is that in a list comprehension, we’re dealing with an iterable. So when I say:

[ len(line) for line in open('/etc/passwd') ]

I’m saying that I want to iterate over the file object we got from opening /etc/passwd. There will be one element in the output list for each element in the input iterable — aka, every line in the file.

That’s great if I want my list comprehension to return something based on each line of /etc/passwd. But each line of /etc/passwd is a string, and thus also iterable. Maybe I want to return something not based on the lines of the file, but on the characters of each line.

Were I to use a “for” loop to process the file, I would use a nested loop — i.e., one loop inside of the other, with the outer loop iterating over lines and the inner loop iterating over consonants. It turns out that we can use a nested list comprehension, too. Here’s a simple example of a nested list comprehension:

[(x,y) for x in range(5) for y in range(5)]

If your reaction to this is, “What in the blazes does that mean?!?” then you’re not alone. Until just recently, that’s what I thought, too.

However: If we rewrite the above nested list comprehension using my preferred (i.e., multi-line) list-comprehension style, I think that things become a bit clearer:

[(x,y) for x in range(5) for y in range(5)]

Let’s take this apart:

Our output expression is the tuple (x,y). That is, this list comprehension will produce a list of two-element tuples.

We first run over the source range(5), giving x the values 0 through 4.

For each value in x, we run through the source range(5), giving y the values 0 through 4.

The number of values in the output depends on the number of runs of the final (second) “for” line.

The output, not surprisingly, will be all of the two-element tuples from (0,0) to (4,4).

Now, let’s mix things up by changing them a bit:

[(x,y) for x in range(5) for y in range(x+1)]

Notice that now, the maximum value of y will vary according to the value of x. So we’ll get from (0,0) to (4,4), but we won’t see such things as (2,4) because y will never be larger than x.

Again, it’s important to understand several things here:

Our “for y” loop will execute once for each iteration over x.

In our “for y” loop, we have access to the variable x.

In our “for x” loop, we don’t have access to y (unless you consider the last value of y to be useful, but you really shouldn’t).

Our (x,y) tuple is output once for each iteration of the *final* loop, at the bottom.

Here’s another example: Assume that we have a few friends over, and that we have decided to play several games of Scrabble. Being Python programmers, we have stored our scores in a dictionary:

{'Reuven':[300, 250, 350, 400], 'Atara':[200, 300, 450, 150], 'Shikma':[250, 380, 420, 120], 'Amotz':[100, 120, 150, 180] }

I want to know each player’s average score, so I write a little function:

def average(scores): return sum(scores) / len(scores)

If we want to find out each individual’s average score, we can use our function and a standard comprehension — in this case, a dict comprehension, to preserve the names:

>>> { name : average(score) for name, score in scores.items() } {'Amotz': 137, 'Atara': 275, 'Reuven': 325, 'Shikma': 292}

But what if I want to get the average score, across all of the players? In such a case, I will need to grab each of the scores from inside of the inner lists. To do that, I can use a nested list comprehension:

>>> average([ one_score for one_player_scores in scores.values() for one_score in one_player_scores ]) 257

What if I’m only interested (for whatever reason) in including scores that were above 200? As with all list comprehensions, I can use the “if” clause to weed out values that I don’t want. That condition can use any and all of the values that I have picked out of the various “for” lines:

>>> [ one_score for one_player_scores in scores.values() for one_score in one_player_scores if one_score > 200] [300, 250, 350, 400, 300, 450, 250, 380, 420]

If I want to put these above-200 scores into a CSV file of some sort, I could do the following:

>>> ','.join([ str(one_score) for one_player_scores in scores.values() for one_score in one_player_scores if one_score > 200]) '300,250,350,400,300,450,250,380,420'

Here’s one final example that I hope will drive these points home: Let’s assume that I have information about a hotel. The hotel has stored its information in a Python list. The list contains lists (representing rooms), and each sublist contains one or more dictionaries (representing people). Here’s our data structure:

rooms = [[{'age': 14, 'hobby': 'horses', 'name': 'A'}, {'age': 12, 'hobby': 'piano', 'name': 'B'}, {'age': 9, 'hobby': 'chess', 'name': 'C'}], [{'age': 15, 'hobby': 'programming', 'name': 'D'}, {'age': 17, 'hobby': 'driving', 'name': 'E'}], [{'age': 45, 'hobby': 'writing', 'name': 'F'}, {'age': 43, 'hobby': 'chess', 'name': 'G'}]]

What are the names of the people staying at our hotel?

>>> [ person['name'] for room in rooms for person in room ] ['A', 'B', 'C', 'D', 'E', 'F', 'G']

How about the names of people staying in our hotel who enjoy chess?

>>> [ person['name'] for room in rooms for person in room if person['hobby'] == 'chess' ] ['C', 'G']

Basically, every “for” line flattens the items over which you’re iterating by one more level, gives you access to that level in both the output expression (i.e., first line) and in the condition (i.e., optional final line).

I hope that this helps you to understand nested list comprehensions. If it did, please let me know! (And if it didn’t, please let me know that, as well!)