

Step 1: Install libxml2 using synaptic package manager

Step 2: Create an xml file that you want to traverse.

For example I am using w3school’s xml document http://www.w3schools.com/xpath/books.xml.

We can also use the local file exist on file system.

Step 3: Create a python for example having name xpathcode.py

Open the xpathcode.py import the libxml2 and urllib. Parse the xml file.

import libxml2 import urllib rss=libxml2.parseDoc(urllib.urlopen('http://www.w3schools.com/xpath/books.xml').read())



Note: If file exist on local file system try like below

import libxml2 import urllib rss=libxml2.parseDoc(open('books.xml', 'r').read())

Step 4: Now try the following xpath query one by one.

a. Selects the first book element that is the child of the bookstore

nodes=rss.xpathEval('/bookstore/book[1]') print nodes[0]

Output:

<book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book>

b. Selects the last book element that is the child of the bookstore element.

nodes=rss.xpathEval('/bookstore/book[last()]') print nodes[0]

Output:

<book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book>

c. Selects the last but one book element that is the child of the bookstore element

nodes=rss.xpathEval('/bookstore/book[last()-1]') print nodes[0]

Output:

<book category="WEB"> <title lang="en">XQuery Kick Start</title> <author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price> </book>

d. Selects the first two book elements that are children of the bookstore element

nodes=rss.xpathEval('/bookstore/book[position()<3]') for i in nodes: print i

Output:

<book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book>

e. Selects all the title elements that have an attribute named lang

nodes=rss.xpathEval('//title[@lang]') for i in nodes: print I

Output:

<title lang="en">Everyday Italian</title> <title lang="en">Harry Potter</title> <title lang="en">XQuery Kick Start</title> <title lang="en">Learning XML</title>

f. Selects all the title elements that have an attribute named lang with a value of ‘eng’

nodes=rss.xpathEval("//title[@lang='eng']") if not nodes: print 'eng not exist'

Output:

eng not exist

g. Selects all the title elements of the book elements of the bookstore element that have a price element with a value greater than 35.00

nodes=rss.xpathEval("/bookstore/book[price>35.00]/title") for i in nodes: print I

Output:

<title lang="en">XQuery Kick Start</title> <title lang="en">Learning XML</title>

h. Selects all the title AND price elements of all book elements

nodes=rss.xpathEval("//book/title | //book/price") for i in nodes: print I

Output:

<title lang="en">Everyday Italian</title> <price>30.00</price> <title lang="en">Harry Potter</title> <price>29.99</price> <title lang="en">XQuery Kick Start</title> <price>49.99</price> <title lang="en">Learning XML</title> <price>39.95</price>

i. Selects all the title elements of the book element of the bookstore element AND all the price elements in the document

nodes=rss.xpathEval("/bookstore/book/title | //price") for i in nodes: print I

Output:

<title lang="en">Everyday Italian</title> <price>30.00</price> <title lang="en">Harry Potter</title> <price>29.99</price> <title lang="en">XQuery Kick Start</title> <price>49.99</price> <title lang="en">Learning XML</title> <price>39.95</price>

j. Select all the title’s text

nodes=rss.xpathEval("/bookstore/book/title/text()") for i in nodes: print i

Output:

Everyday Italian Harry Potter XQuery Kick Start Learning XML

for more detail on xpath please visit: http://www.w3schools.com/xpath/default.asp