Problem

I had an XML file (an RSS feed) from which I wanted to extract some data. I tried some XML libraries but I didn’t like any of them. Is there a simple, brain-friendly way for this? After all, it’s Python, so everything should be simple.

Solution

Yes, there is a simple library for reading XML called “untangle“, developed by Chris Stefanescu. It’s in PyPI, so installation is very easy:

sudo pip install untangle

For some examples, visit the project page.

Use Case

Let’s see a simple, real-world example. From the RSS feed of Planet Python, let’s extract the post titles and their URLs.

#!/usr/bin/env python import untangle #XML = 'examples/planet_python.xml' # can read a file too XML = 'http://planet.python.org/rss20.xml' o = untangle.parse(XML) for item in o.rss.channel.item: title = item.title.cdata link = item.link.cdata if link: print title print ' ', link

It couldn’t be any simpler :)

Limitations

According to Chris, untangle doesn’t support documents with namespaces (yet).

Related posts

Alternatives (update 20111031)

Here are some alternatives (thanks reddit).

lxml and amara are heavyweight solutions and are built upon C libraries so you may not be able to use them everywhere. untangle is a lightweight parser that can be a perfect choice to read a small and simple XML file.