by Lorenzo Bolla

Experiments in pickling

pickle is a standard library module to serialize and deserialize Python objects. Being written in pure Python, it's fairly slow, so the standard library provides a pure-C implementation, called cPickle , with the limitation that it cannot be subclassed.

What's interesting about cPickle are two little known settings that can speed up serialization quite substantially.

cPickle.HIGHEST_PROTOCOL basically dumps a Python object using a binary protocol, rather than the default ASCII-based, more portable, protocol 0. If portability or backward compatibility are not an issue for you, you should use it: it's documented and probably here to stay. Pickler 's undocumented fast flag. It turns out that cPickle implementation has a "fast mode" that is enabled by setting this fast flag to True . It's undocumented so probably subject to change, but it makes dumping objects way faster.

Looking at the implementation of cPickle , the comments in the code have something to say about "fast mode":

The fast mode disable the usage of memo, therefore speeding the pickling process by not generating superfluous PUT opcodes. It should not be used if with self-referential objects.

memo is basically a cache, within the pickler, that remembers what objects have already been processed, used mainly to avoid infinite loops when dumping self-referential data structures. But if you are dumping data structures that do not reference themselves, you can spare some time disabling this caching.

To test these settings, I compared "vanilla" cPickle.dumps , with "highest\ protocol " and "fast mode", for 3 different objects: a small, a medium and a large list of dictionaries. The code is available in gist :

Clone it and run it:

$> git clone https://gist.github.com/1bec1b70ef9c8e254b57.git pickle_experiments $> cd pickle_experiments $> sh run.sh

On my machine, I get these results:

SMALL dumps 10 loops, best of 3: 7.39 usec per loop highp 10 loops, best of 3: 2.5 usec per loop pickl 10 loops, best of 3: 4.1 usec per loop MEDIUM dumps 10 loops, best of 3: 705 usec per loop highp 10 loops, best of 3: 206 usec per loop pickl 10 loops, best of 3: 111 usec per loop LARGE dumps 3 loops, best of 3: 1.34 sec per loop highp 10 loops, best of 3: 823 msec per loop pickl 10 loops, best of 3: 135 msec per loop

Note: I've reduced the number of iterations to 3 for "vanilla LARGE" because it was taking too long…