Python provides great built-in types like dict , list , tuple and set ; there are also array , collections , heapq modules in the standard library; this article is an overview of external lesser known packages with fast C/C++ based data structures usable from Python.

Bloom Filter ( wiki ) is an extremely memory-efficient probabilistic data structure which is used to test whether an element is a member of a set; there may be false positive retrieval results, but false negatives are not possible ("item not in set" query result is always correct).

bitarray package provides a data structure which represents an array of booleans efficiently (in a bit vector). It is also useful for dealing with bit-level data and data compressed with variable bit length encoding.

There is a rejected PEP-3128 about the inclusion of blist into the standard library.

blist package provides several data structures ( blist , sortedlist , weaksortedlist , etc) that may act like general-purpose containers replacing standard list . Blist uses a hybrid array/tree structure that makes inserts and removals from the middle fast (these operations requires moving big memory chunks with standard list ).

carray package provides a chunked+compressed data structure for numerical data. It uses less memory than traditional ndarray and provides efficient shrinks and appends (copies of the whole array are not needed).

The king of numeric arrays is numpy ( home page ). It provides several data structures ( ndarray , structured arrays) for single- and multi-dimensional numeric data. SciPy provides support for sparse arrays (scipy.sparse), k-d trees (scipy.spatial.cKDTree) and much more. Pandas ( home page ) provides extra goodies. There is a lot of information about numpy, pandas and scipy on the Internet; they deserves more than one paragraph, but let's move on.

python-llist package provides classical linked list extension for Python. There is also a linked list implementation in roly .

Tries

There is no tree/trie/graph structure in Python standard library and pure-Python implementations suffer from extensive memory usage; using a C++/C-based trie (wiki) implementation is a good idea.

Note In the following tests the memory usage was measured for 3 million unique unicode Russian words; "simple lookup" was a lookup for the word "АВИАЦИЯ".

Trie from BioPython License: Biopython License (it is extremely liberal) Memory usage: 242M Simple lookup: 333 ns (1004 ns with encoding) Unicode: no Python: 2.x If I properly understood the code, this is a pointer-based implementation of Patricia-Trie (aka Radix-Trie, wiki) and may use a lot of memory because of that. It doesn't work under Python 3.x and doesn't directly support unicode. All trie operations (exact lookups, prefix lookups, inserts & updates) are fast & efficient. Example: >>> from Bio import trie >>> tr = trie . trie () >>> for word in words : ... tr [ word . encode ( 'utf8' )] = len ( word ) >>> tr [ 'АВИАЦИЯ' ] 7

Judy Arrays Judy Arrays (wiki) are known to be very fast but obscure data structure heavily optimized for 32bit systems. Unfortunately I was not able to install neither PyJudy nor py-judy nor py-judy2 Python wrapper so I have nothing more to say about Judy Arrays :)

HAT-Trie License: MIT Memory usage: 125M Simple lookup: 195 ns Unicode: yes Python: 2.x and 3.x HAT-Trie (pdf) is the Trie-HashMap hybrid. It is claimed to be the state-of-art Trie-like structure with fastest lookups. I've started a hat-trie Python wrapper for the very nice C HAT-Trie implementation by Daniel Jones, but never finished it. The wrapper is not polished and needs more love but the basics (trie building and exact lookups) are implemented. Benchmarks show this trie is indeed fast (the wrapper bottleneck is Python unicode<->bytes conversion, not the trie itself). It is not very memory efficient and some operations taken for granted for tries (like prefix search) may be slow-ish and/or hard to implement for HAT-tries. Example: >>> import hat_trie >>> trie = hat_trie . Trie () >>> for word in words : ... trie [ word ] = len ( word ) >>> trie [ u'АВИАЦИЯ' ] 7

Python-CharTrie License: BSD Memory usage: 194M Simple lookup: 175 ns (840 ns with encoding) Unicode: no Python: 2.x and 3.x As far as I can tell, python-chartrie provides a pointer-based implementation of the classic Trie data structure; it is very fast but not memory efficient; unicode is not directly supported. Example: >>> import chartrie >>> trie = chartrie . CharTrie () >>> for word in words : ... trie [ word . encode ( 'utf8' )] = len ( word ) >>> trie [ 'АВИАЦИЯ' ] 7

DATrie License: LGPL v2.1 Memory usage: 101M Simple lookup: 281 ns Unicode: yes Python: 2.x and 3.x datrie is a Python wrapper for the Double-Array Trie C implementation (home page) by Theppitak Karoonboonyanan. The library has rich API (including advanced iteration and walking), is quite fast, works under Python 2.x and 3.x and supports unicode. The limitation of this library is that inserting items into trie may be slow, especially if insertions are unsorted and the trie is big. Another limitation is that the alphabet for the keys must be defined by developer at trie creation time. Python wrapper uses utf_32_le codec internally; this codec is currently slow and it is the bottleneck for datrie. There is a ticket with a patch in the CPython bug tracker (http://bugs.python.org/issue15027) that should make this codec fast, so there is a hope datrie will become faster with future Pythons. Example: >>> import datrie >>> ALPHABET = u'-АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ' >>> trie = datrie . BaseTrie ( ALPHABET ) >>> for word in words : ... trie [ word ] = len ( word ) >>> trie [ u'АВИАЦИЯ' ] 7