XML data binding with python descriptors

Python descriptors are used to represent attributes of other classes. A descriptor must implement one or more methods of the descriptor protocol:

__get__(self, instance, owner) __set__(self, instance, value) __delete__(self, instance)

There are already some great articles about descriptors from Raymond Hettinger, Mark Summerfield and Marty Alchin. Descriptors are used since python 2.2 to implement new style classes and in Django ORM to implement the ForeignKey , OneToOneField and ManyToManyField relations.

The following code shows how you can use descriptors and XPath expressions to access XML datastructures in a more pythonic way.

import lxml.etree class Bind(object): def __init__(self, path, converter=None, first=False): ''' path -- xpath to select elements converter -- run result through converter first -- return only first element instead of a list of elements ''' self.path = path if converter is None: converter = lambda x: x self.converter = converter self.first = first def __get__(self, instance, owner=None): res = instance._elem.xpath(self.path) if self.first: return self.converter(res[0]) return [self.converter(r) for r in res]

The Bind Descriptor expects the class instance to have a attribute _elem which is a lxml.etree._Element .

I'm using a sample XML response from the isbndb.com REST Api to show the data-binding.

<ISBNdb server_time="2010-07-21T15:56:06Z"> <BookList total_results="1"> <BookData book_id="programming_collective_intelligence" isbn="0596529325"> <Title>Programming collective intelligence</Title> <AuthorsText>Toby Segaran</AuthorsText> <PublisherText publisher_id="oreilly">O'Reilly, 2007.</PublisherText> </BookData> </BookList> </ISBNdb>

And the data mapping:

import dateutil class Data(object): def __init__(self, elem): self._elem = elem class Book(Data): #use xpath text() to get text title = Bind('Title/text()', first=True) #get text via converter author = Bind('AuthorsText', converter=lambda x: x.text, first=True) publisher = Bind('PublisherText/text()', first=True) publisher_id = Bind('PublisherText/@publisher_id', first=True) class ISBNdb(Data): #use the dateutil.parser to convert string to datetime server_time = Bind('@server_time', converter=dateutil.parser.parse, first=True) #convert result to integer total_results = Bind('BookList/@total_results', converter=int, first=True) #bind result to custom class which is itself a mapping books = Bind('//BookData', Book)

Now let's play with the mapping:

>>> db = ISBNdb(lxml.etree.fromstring(test_response)) >>> db.server_time datetime.datetime(2010, 7, 21, 15, 56, 6, tzinfo=tzutc()) >>> db.total_results 1 >>> db.books [<Book object at 0x9c1780c>] >>> book = db.books[0] >>> book.title 'Programming collective intelligence' >>> book.author 'Toby Segaran' >>> book.publisher "O'Reilly, 2007." >>> book.publisher_id 'oreilly'

The source code from this example is available on gist.github.com/485977.