It's sometimes necessary to count the number of distinct occurrences in an collection. For example, counting how many times each letter occurs in a block of text. Or sorting a list by its most common member.

If I were to do this sort of counting with SQL, I would generally use something like this:

SELECT count ( * ) FROM table GROUP BY column

This could easily by combined this with an ORDER BY to get the most common items.

However, assuming you are working with some raw data, here are some strategies for counting distinct occurrences in Python. Skip to the end to see which method performs best.

dict and in A plain dictionary works well as a counter. Though using it is verbose, it performs surprisingly well and works in any python version. counter = dict () foods = [ 'soy' , 'dairy' , 'gluten' , 'soy' ] for k in foods : if not k in counter : counter [ k ] = 1 else : counter [ k ] += 1 .. >>> counter { 'soy' : 2 , 'cheese' : 1 , 'dairy' : 1 }

defaultdict I've always loved the defaultdict. Used properly, it can cut out a lot of boilerplate from your code. It has many applications, one of which is a counter. from collections import defaultdict counter = defaultdict ( int ) foods = [ 'soy' , 'dairy' , 'gluten' , 'soy' ] for k in foods : counter [ k ] += 1 .. >>> counter defaultdict ( < type 'int' > , { 'soy' : 2 , 'cheese' : 1 , 'dairy' : 1 }) By passing int to the class, all empty keys default to zero. This allows you to do += without setting the key first.

dict and setdefault Dictionaries have a setdefault method that allows you to set the default value for a single key. According to the python docs, running setdefault on every key is slower than using defaultdict. The benchmark below confirms this. counter = dict () foods = [ 'soy' , 'dairy' , 'gluten' , 'soy' ] for k in foods : counter . setdefault ( k , 0 ) counter [ k ] += 1 ... >>> counter { 'soy' : 2 , 'cheese' : 1 , 'dairy' : 1 }

collections.Counter Python 2.7 introduced collections.Counter which makes this trivial. from collections import Counter foods = [ 'soy' , 'dairy' , 'gluten' , 'soy' ] Counter ( foods ) .. >>> counter Counter ({ 'soy' : 2 , 'gluten' : 1 , 'dairy' : 1 }) By passing a list to the Counter constructor, it does the grouping for us. It still behaves like a dictionary so we can still do stuff like >>> counter [ 'soy' ] += 3 >>> counter [ 'soy' ] 5

Benchmarks Here are some quick and dirty benchmarks for these methods. I used this code to generate the data. I took some text by The Bard and counted the number of each letter and each word. There were a lot more unique words than letters which resulted in slower times to count them. Keys Counter defaultdict dict.setdefault dict.in 6691 3.62 1.97 2.88 1.95 26727 13.13 4.31 9.58 7.17 These results show that while a plain dict and in checks performs best for a smaller number of keys, it's not significantly better than defaultdict. With a larger number of distinct members, defaultdict did substantially better than any other option.