The Explorer

Managing Records in Python (Part 1 of 3)

by Michele Simionato

September 7, 2009



Everybody has worked with records: by reading CSV files, by interacting with a database, by coding in a programmming language. Records look like an old, traditional, boring topic where everything has been said already. However this is not the case. Actually, there is still plenty to say about records: in this three part series I will discuss a few general techniques to read, write and process records in modern Python. The first part (the one you are reding now) is introductory and consider the problem of reading a CSV file with a number of fields which is known only at runt time; the second part discusses the problem of interacting with a database; the third and last part discusses the problem of rendering records into XML or HTML format.

For many years there was no record type in the Python language, nor in the standard library. This is hard to believe, but true: the Python community has asked for records in the language from the beginning, but Guido never considered that request. The canonical answer was "in real life you always need to add methods to your data, so just use a custom class". The good news is that the situation has finally changed: starting from Python 2.6 records are part of the standard library under the dismissive name of named tuples. You can use named tuples even in older versions of Python, simply by downloading Raymond Hettinger's recipe on the Python Cookbook:

$ wget http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/500261/index_txt -O namedtuple.py

The existence of named tuples has changed completely the way of managing records: nowadays a named tuple has become the one obvious way to implement immutable records. Mutable records are much more complex and they are not available in the standard library, nor there are plans for their addition, at least as far as I know. There are many viable alternatives if you need mutable records: the typical use case is managing database records and that can be done with an Object Relational Mapper. There is also a Cookbook recipe for mutable records which is a natural extension of the namedtuple recipe. Notice however that there are people who think that mutable records are Evil. This is the dominant opinion in the functional programming community: in that context the only way to modify a field is to create a new record which is a copy of the original record except for the modified field. In this sense named tuples are functional structures and they support functional update via the _replace method; I will discuss this point in detail in a short while.

To use named tuples is very easy and you can just look at the examples in the standard library documentation. Here I will duplicate part of what you can find in there, for the benefit of the lazy readers:

>>> from collections import namedtuple >>> Article = namedtuple("Article", 'title author') >>> article1 = Article("Records in Python", "M. Simionato") >>> print article1 Article(title='Records in Python', author="M. Simionato")

namedtuple is a function working as a class factory: it takes in input the name of the class and the names of the fields - a sequence of strings or a string of space-or-comma-separated names - and it returns a subclass of tuple . The fundamental feature of named tuples is that the fields are accessible both per index and per name:

>>> article1.title 'Records in Python' >>> article1.author "M. Simionato"

Therefore, named tuples are much more readable than ordinary tuples: you write article1.author instead of article1[1] . Moreover, the constructor accepts both a positional syntax and a keyword argument syntax, so that it is possible to write

>>> Article(author="M. Simionato", title="Records in Python") Article(title='Records in Python', author="M. Simionato")

in the opposite order without issues. This is a major strength of named tuples. You can pass all the arguments as positional arguments, all the arguments are keyword arguments and even some arguments as positional and some others as keyword arguments:

>>> title = 'Records in Python' >>> kw = dict(author="M. Simionato") >>> Article(title, **kw) Article(title='Records in Python', author="M. Simionato")

This "magic" has nothing to do with namedtuple per se: it is the standard way argument passing works in Python, even if I bet many people do not know that it is possible to mix the arguments. The only real restriction is that you must put the keyword arguments after the positional arguments.

Another advantage is that named tuples are tuples, so that you can use them in your legacy code expecting regular tuples, and everything will work just fine, including tuple unpacking (i.e. title, author = article1 ), possibly via the * notation (i.e. f(*article1) ).

An additional feature with respect to traditional tuples, is that named tuples support functional update, as I anticipated before:

>>> article1._replace(title="Record in Python, Part I") Article(title="Record in Python, Part I", author="M. Simionato")

returns a copy of the original named tuple with the field title updated to the new value.

Internally, namedtuple works by generating the source code from the class to be returned and by executing it via exec . You can look at the generated code by setting the flag verbose=True when you invoke``namedtuple``. The readers of my series about Scheme (The Adventures of a Pythonista in Schemeland ) will certainly be reminded of macros. Actually, exec is more powerful than Scheme macros, since macros generate code at compilation time whereas exec works at runtime. That means that in order to use macro you must know the structure of the record before executing the program, whereas exec is able to define the record type during program execution. In order to do the same in Scheme you would need to use eval , not macro.