One type of syntactic sugar that sets Python apart from more verbose languages is comprehensions. Comprehensions are a special notation for building up lists, dictionaries and sets from other lists, dictionaries and sets, modifying and filtering them in the process.

They allow you to express complicated looping logic in a tiny amount of space.

List Comprehensions

List comprehensions are the best known and most widely used. Let’s start with an example.

A common programming task is to iterate over a list and transform each element in some way, e.g:

>>> squares = []<br />

>>> for num in range(10):<br />

squares.append(num**2)<br />

>>> squares<br />

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81] 1 2 3 4 5 >>> squares = [ ] >>> for num in range ( 10 ) : squares . append ( num * * 2 ) >>> squares [ 0 , 1 , 4 , 9 , 16 , 25 , 36 , 49 , 64 , 81 ]

That’s the kind of thing you might do if you were a Java programmer. Luckily for us, though, list comprehensions allow the same idea to be expressed in much fewer lines.

>>> squares = [x**2 for x in range(10)]<br />

>>> squares<br />

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81] 1 2 3 >>> squares = [ x * * 2 for x in range ( 10 ) ] >>> squares [ 0 , 1 , 4 , 9 , 16 , 25 , 36 , 49 , 64 , 81 ]

The basic syntax for list comprehensions is this: [EXPRESSION FOR ELEMENT IN SEQUENCE].

Another common task is to filter a list and create a new list composed of only the elements that pass a certain condition. The next snippet constructs a list of every number from 0 to 9 that has a modulus with 2 of zero, i.e. every even number.

>>> [x for x in range(10) if x % 2 == 0]<br />

[0, 2, 4, 6, 8] 1 2 >>> [ x for x in range ( 10 ) if x % 2 == 0 ] [ 0 , 2 , 4 , 6 , 8 ]

Using an IF-ELSE construct works slightly differently to what you might expect. Instead of putting the ELSE at the end, you need to use the ternary operator – x if y else z.

The following list comprehension generates the squares of even numbers and the cubes of odd numbers in the range 0 to 9.

>>> [x**2 if x % 2 == 0 else x**3 for x in range(10)]<br />

[0, 1, 4, 27, 16, 125, 36, 343, 64, 729] 1 2 >>> [ x * * 2 if x % 2 == 0 else x * * 3 for x in range ( 10 ) ] [ 0 , 1 , 4 , 27 , 16 , 125 , 36 , 343 , 64 , 729 ]

List comprehensions can also be nested inside each other. Here is how we can generate a two-dimensional list, populated with zeros. (I have wrapped the comprehension in pprint to make the output more legible.)

>>> pprint([[0 for x in range(10)] for y in range(10)])<br />

[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],<br />

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],<br />

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],<br />

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],<br />

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],<br />

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],<br />

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],<br />

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],<br />

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],<br />

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]] 1 2 3 4 5 6 7 8 9 10 11 >>> pprint ( [ [ 0 for x in range ( 10 ) ] for y in range ( 10 ) ] ) [ [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ] , [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ] , [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ] , [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ] , [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ] , [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ] , [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ] , [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ] , [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ] , [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ] ]

(As you have probably noticed, it is possible to create list comprehensions that are utterly illegible, so please think about who has to touch your code after you and exercise some restraint.)

On the other hand, the syntax of basic comprehensions might seem complicated to you now, but I promise that with time it will become second nature.

Generator Expressions

A list comprehension creates an entire list in memory. In many cases, that’s what you want because you want to iterate over the list again or otherwise manipulate after it has been created. In other cases, however, you don’t want the list at all. Generator expression – described in PEP 289 – were added for this purpose.

Let’s say you want to calculate the sum of the squares of a range of numbers. Without generator expressions, you would do this:

>>> sum([x**2 for x in range(20)])<br />

2470 1 2 >>> sum ( [ x * * 2 for x in range ( 20 ) ] ) 2470

That creates a list in memory just to throw it away once the reference to it is no longer needed, which is wasteful. Generator expressions are essentially a way to define an anonymous generator function and calling it, allowing you to ditch the square brackets and write this:

>>> sum(x**2 for x in range(20))<br />

2470 1 2 >>> sum ( x * * 2 for x in range ( 20 ) ) 2470

They are also useful for other aggregate functions like min , max .

The set and dict constructors can take generator expressions too:

set(word for word in vocab_list if word not in known_words) 1 set ( word for word in vocab_list if word not in known_words )

dict((grade, convert_to_letter(grade)) for grade in report_card) 1 dict ( ( grade , convert_to_letter ( grade ) ) for grade in report_card )

Dict Comprehensions

On top of list comprehensions, Python now supports dict comprehensions, which allow you to express the creation of dictionaries at runtime using a similarly concise syntax.

A dictionary comprehension takes the form {key: value for (key, value) in iterable} . This syntax was introduced in Python 3 and backported as far as Python 2.7, so you should be able to use it regardless of which version of Python you have installed.

A canonical example is taking two lists and creating a dictionary where the item at each position in the first list becomes a key and the item at the corresponding position in the second list becomes the value.

>>> {k: v**3 for (k, v) in zip(string.ascii_lowercase, range(26))}<br />

{‘i’: 512, ‘e’: 64, ‘o’: 2744, ‘h’: 343, ‘l’: 1331, ‘s’: 5832, ‘b’: 1, ‘w’: 10648, ‘c’: 8, ‘x’: 12167, ‘y’: 13824, ‘t’: 6859, ‘p’: 3375, ‘d’: 27, ‘j’: 729, ‘a’: 0, ‘z’: 15625, ‘f’: 125, ‘q’: 4096, ‘u’: 8000, ‘n’: 2197, ‘m’: 1728, ‘r’: 4913, ‘k’: 1000, ‘g’: 216, ‘v’: 9261} 1 2 >>> { k : v * * 3 for ( k , v ) in zip ( string . ascii_lowercase , range ( 26 ) ) } { ‘i’ : 512 , ‘e’ : 64 , ‘o’ : 2744 , ‘h’ : 343 , ‘l’ : 1331 , ‘s’ : 5832 , ‘b’ : 1 , ‘w’ : 10648 , ‘c’ : 8 , ‘x’ : 12167 , ‘y’ : 13824 , ‘t’ : 6859 , ‘p’ : 3375 , ‘d’ : 27 , ‘j’ : 729 , ‘a’ : 0 , ‘z’ : 15625 , ‘f’ : 125 , ‘q’ : 4096 , ‘u’ : 8000 , ‘n’ : 2197 , ‘m’ : 1728 , ‘r’ : 4913 , ‘k’ : 1000 , ‘g’ : 216 , ‘v’ : 9261 }

(Look how jumbled up it is. A reminder that dicts have no natural ordering.)

The zip function used inside this comprehension returns an iterator of tuples, where each element in the tuple is taken from the same position in each of the input iterables. In the example above, the returned iterator contains the tuples (“a”, 1), (“b”, 2), etc.

Any iterable can be used in a dict comprehension, including strings. The following code might be useful if you wanted to generate a dictionary that stores letter frequencies, for instance.

>>> {c: 0 for c in string.ascii_lowercase}<br />

{‘u’: 0, ‘c’: 0, ‘k’: 0, ‘v’: 0, ‘n’: 0, ‘l’: 0, ‘q’: 0, ‘g’: 0, ‘a’: 0, ‘m’: 0, ‘r’: 0, ‘e’: 0, ‘j’: 0, ‘d’: 0, ‘f’: 0, ‘z’: 0, ‘p’: 0, ‘x’: 0, ‘s’: 0, ‘i’: 0, ‘t’: 0, ‘b’: 0, ‘w’: 0, ‘h’: 0, ‘o’: 0, ‘y’: 0} 1 2 >>> { c : 0 for c in string . ascii_lowercase } { ‘u’ : 0 , ‘c’ : 0 , ‘k’ : 0 , ‘v’ : 0 , ‘n’ : 0 , ‘l’ : 0 , ‘q’ : 0 , ‘g’ : 0 , ‘a’ : 0 , ‘m’ : 0 , ‘r’ : 0 , ‘e’ : 0 , ‘j’ : 0 , ‘d’ : 0 , ‘f’ : 0 , ‘z’ : 0 , ‘p’ : 0 , ‘x’ : 0 , ‘s’ : 0 , ‘i’ : 0 , ‘t’ : 0 , ‘b’ : 0 , ‘w’ : 0 , ‘h’ : 0 , ‘o’ : 0 , ‘y’ : 0 }

(The code above is just an example of using a string as an iterable inside a comprehension. If you really want to count letter frequencies, you should check out collections.Counter.)

Dict comprehensions can use complex expressions and IF-ELSE constructs too. This one maps the numbers in a specific range to their cubes:

>>> {x: x**3 for x in range(10)}<br />

{0: 0, 1: 1, 2: 8, 3: 27, 4: 64, 5: 125, 6: 216, 7: 343, 8: 512, 9: 729} 1 2 >>> { x : x * * 3 for x in range ( 10 ) } { 0 : 0 , 1 : 1 , 2 : 8 , 3 : 27 , 4 : 64 , 5 : 125 , 6 : 216 , 7 : 343 , 8 : 512 , 9 : 729 }

And this one omits cubes that are not divisible by 4:

>>> {x: x**3 for x in range(10) if x**3 % 4 == 0}<br />

{0: 0, 8: 512, 2: 8, 4: 64, 6: 216} 1 2 >>> { x : x * * 3 for x in range ( 10 ) if x * * 3 % 4 == 0 } { 0 : 0 , 8 : 512 , 2 : 8 , 4 : 64 , 6 : 216 }

Set Comprehensions

A set is an unordered collection of elements in which each element can only appear once. Although sets have existed in Python since 2.4, Python 3 introduced the set literal syntax.

>>> nums = {1, 54, 124}<br />

>>> nums<br />

{1, 124, 54} 1 2 3 >>> nums = { 1 , 54 , 124 } >>> nums { 1 , 124 , 54 }

Python 3 also introduced set comprehensions.

>>> nums = {n**2 for n in range(10)}<br />

>>> nums<br />

{0, 1, 64, 4, 36, 9, 16, 49, 81, 25} 1 2 3 >>> nums = { n * * 2 for n in range ( 10 ) } >>> nums { 0 , 1 , 64 , 4 , 36 , 9 , 16 , 49 , 81 , 25 }

Prior to this, you could use the set built-in function.

>>> nums = set(n**2 for n in range(10))<br />

>>> nums<br />

{0, 1, 64, 4, 36, 9, 16, 49, 81, 25} 1 2 3 >>> nums = set ( n * * 2 for n in range ( 10 ) ) >>> nums { 0 , 1 , 64 , 4 , 36 , 9 , 16 , 49 , 81 , 25 }

The syntax for set comprehensions is almost identical to that of list comprehensions, but it uses curly brackets instead of square brackets. The pattern is {EXPRESSION FOR ELEMENT IN SEQUENCE}.

The result of a set comprehension is the same as passing the output of the equivalent list comprehension to the set function.

That’s it for the theory. Now let’s dissect some examples of comprehensions.

Examples

List of files with the .png extension

The os module contains a function called listdir that returns a list of filenames in a given directory. We can use the endswith method on the strings to filter the list of files.

def list_files_with_extension(where, ext):<br />

return [f for f in os.listdir(where)<br />

if f.endswith(ext)] 1 2 3 def list_files_with_extension ( where , ext ) : return [ f for f in os . listdir ( where ) if f . endswith ( ext ) ]

Here it is in usage:

>>> list_files_with_extension(“/home/me/pics”, “.png”)<br />

[“grumpycat.png”, “alien.png”] 1 2 >>> list_files_with_extension ( “/home/me/pics” , “.png” ) [ “grumpycat.png” , “alien.png” ]

Merge two dictionaries

Merging two dictionaries together can be achieved easily in a dict comprehension:

def merge_dicts(d1, d2):<br />

return {k: v for d in (d1, d2) for k, v in d.items()} 1 2 def merge_dicts ( d1 , d2 ) : return { k : v for d in ( d1 , d2 ) for k , v in d . items ( ) }

Here is merge_dicts in action:

>>> boys_ages = {“Tom”: 14, “Patrick”: 12, “Sean”: 15}<br />

>>> girls_ages = {“Jeanne”: 14, “Marie”: 12}<br />

>>> merge_dicts(boys_ages, girls_ages)<br />

{‘Sean’: 15, ‘Tom’: 14, ‘Marie’: 12, ‘Patrick’: 12, ‘Jeanne’: 14} 1 2 3 4 >>> boys_ages = { “Tom” : 14 , “Patrick” : 12 , “Sean” : 15 } >>> girls_ages = { “Jeanne” : 14 , “Marie” : 12 } >>> merge_dicts ( boys_ages , girls_ages ) { ‘Sean’ : 15 , ‘Tom’ : 14 , ‘Marie’ : 12 , ‘Patrick’ : 12 , ‘Jeanne’ : 14 }

Sieve of Eratosthenes

The Sieve of Eratosthenes is an ancient algorithm for finding prime numbers. You might remember it from school. It works like this:

Starting at 2, which is the first prime number, exclude all multiples of 2 up to n.

Move on to 3. Exclude all multiples of 3 up to n.

Keep going like that until you reach n.

And here’s the code:

def erathostenes(n):<br />

not_prime = {j for i in range(2, n)<br />

for j in range(i*2, n, i)}</p>

<p> return {i for i in range(2, n)<br />

if i not in not_prime} 1 2 3 4 5 6 def erathostenes ( n ) : not_prime = { j for i in range ( 2 , n ) for j in range ( i * 2 , n , i ) } return { i for i in range ( 2 , n ) if i not in not_prime }

The first thing to note about the function is the use of a double loop in the first set comprehension. Contrary to what you might expect, the leftmost loop is the outer loop and the rightmost loop is the inner loop. The pattern for double loops in list comprehensions is [x for b in a for x in b].

In case you hadn’t seen it before, the third argument in the rightmost call to range represents the step size.

It would be possible to use a list comprehension for this algorithm, but the not_primes list would be filled with duplicates. It is better to use the automatical deduplication behaviour of the set to avoid that.

Exercises

I’ve included some exercises to help you solidify your new knowledge of comprehensions.

1. Write a function called generate_matrix that takes two positional arguments – m and n – and a keyword argument default that specifies the value for each position. It should use a nested list comprehension to generate a list of lists with the given dimensions. If default is provided, each position should have the given value, otherwise the matrix should be populated with zeroes.

2. Write a function called initcap that replicates the functionality of the string.title method, except better. Given a string, it should split the string on whitespace, capitalize each element of the resulting list and join them back into a string. Your implementation should use a list comprehension.

3. Write a function called make_mapping that takes two lists of equal length and returns a dictionary that maps the values in the first list to the values in the second. The function should also take an optional keyword argument called exclude , which expects a list. Values in the list passed as exclude should be omitted as keys in the resulting dictionary.

4. Write a function called compress_dict_keys that takes a dictionary with string keys and returns a new dictionary with the vowels removed from the keys. For instance, the dictionary {"foo": 1, "bar": 2} should be transformed into {"f": 1, "br": 2} . The function should use a list comprehension nested inside a dict comprehension.

5. Write a function called dedup_surnames that takes a list of surnames names and returns a set of surnames with the case normalized to uppercase. For instance, the list ["smith", "Jones", "Smith", "BROWN"] should be transformed into the set {"SMITH", "JONES", "BROWN"} .

Solutions

1. Nest two list comprehensions to generate a 2D list with m rows and n columns. Use default for the value in each position in the inner comprehension.

def generate_matrix(m, n, default=0):<br />

return [[default for i in range(n)]<br />

for i in range(m)] 1 2 3 def generate_matrix ( m , n , default = 0 ) : return [ [ default for i in range ( n ) ] for i in range ( m ) ]

2. Disassemble the sentence passed into the function using split , then call capitalize on each word, then use join to reassemble the sentence.

def initcap(s):<br />

return ‘ ‘.join(w.capitalize() for w in s.split()) 1 2 def initcap ( s ) : return ‘ ‘ . join ( w . capitalize ( ) for w in s . split ( ) )

3. Join the two lists a and b using zip , then use the zipped lists in the dictionary comprehension.

def make_mapping(a, b, exclude=[]):<br />

return {k: v for (k, v) in zip(a, b)<br />

if k not in exclude} 1 2 3 def make_mapping ( a , b , exclude = [ ] ) : return { k : v for ( k , v ) in zip ( a , b ) if k not in exclude }

4. Iterate over the key-value pairs from the passed-in dictionary and, for each key, remove the vowels using a comprehension with an IF construct.

def compress_dict_keys(d):<br />

return {”.join(c for c in k<br />

if c not in “aeiou”): v<br />

for (k, v) in d.items()} 1 2 3 4 def compress_dict_keys ( d ) : return { ” . join ( c for c in k if c not in “aeiou” ) : v for ( k , v ) in d . items ( ) }

5. Use the set comprehension syntax (with curly brackets) to iterate over the given list and call upper on each name in it. The deduplication will happen automatically due to the nature of the set data structure.

def dedup_surnames(names):<br />

return {name.upper() for name in names} 1 2 def dedup_surnames ( names ) : return { name . upper ( ) for name in names }

I’ll leave it there for now. If you’ve worked your way through this post and given the exercises a good try, you should be ready to use comprehensions in your own code.