xmldataset: simple xml parsing¶

A Python library that simplifies the extraction of datasets from XML content.

XML is a simple markup format. Whilst simple, extracting data of interest is often more complicated than it needs to be.

xmldataset addresses this through an easy to use plaintext declaration that follows the structure of the XML document. The declaration is indented, matching the XML structure, the data we are interested in is tagged against a dataset.

Take for example, an XML document that lists colleagues:

# Declare XML xml = """<?xml version="1.0"?> <colleagues> <colleague> <title>The Boss</title> <phone>+1 202-663-9108</phone> <email>boss@the_company.com</email> </colleague> <colleague> <title>Admin Assistant</title> <phone>+1 347-999-5454</phone> <email>admin@the_company.com</email> </colleague> <colleague> <title>Minion</title> <phone>+1 792-123-4109</phone> <email>minion@the_company.com</email> </colleague> </colleagues>"""

To capture the title, email and phone for each colleague, it is simple, using xmldataset:

import xmldataset # xmldataset declaration profile = """ colleagues colleague title = dataset:colleagues phone = dataset:colleagues email = dataset:colleagues""" # Print the output print ( xmldataset . parse_using_profile ( xml , profile ))

Resulting in the following output:

{ 'colleagues' : [ { 'email' : 'boss@the_company.com' , 'phone' : '+1 202-663-9108' , 'title' : 'The Boss' }, { 'email' : 'admin@the_company.com' , 'phone' : '+1 347-999-5454' , 'title' : 'Admin Assistant' }, { 'email' : 'minion@the_company.com' , 'phone' : '+1 792-123-4109' , 'title' : 'Minion' }]}