Introduction

So, as an example, let’s say you have to parse the following XML file…

family.xml

Unlike with JSON, there is an XML parser that comes bundled with OTP: xmerl. So, since you already have it, you decide to use it to parse the above file. After a little bit of googling, you determine that all you need is this function:

Oh, well… that was verbose! First of all, the docs state clearly that the result of xmerl_scan:file/1 is a tuple {xmlElement(), Rest} and xmlElement() = #xmlElement{}. So, let’s try adding the record definitions to our shell…



[xmerl_event,xmerl_fun_states,xmerl_scanner,xmlAttribute,

xmlComment,xmlContext,xmlDecl,xmlDocument,xmlElement,

xmlNamespace,xmlNode,xmlNsNode,xmlObj,xmlPI,xmlText]

3> {Element, _} = xmerl_scan:file("family.xml"), Element.

#xmlElement{

name = family,expanded_name = family,nsinfo = [],

namespace =

#xmlNamespace{

default = '

parents = [],pos = 1,

attributes =

[#xmlAttribute{

name = xmlns,expanded_name = [],nsinfo = [],namespace = [],

parents = [{family,1}],

pos = 1,language = [],value = "

normalized = false}],

content =

[#xmlText{

parents = [{family,1}],

pos = 1,language = [],value = "

",type = text},

#xmlElement{

name = parents,expanded_name = parents,nsinfo = [],

namespace =

#xmlNamespace{

default = '

parents = [{family,1}],

pos = 2,attributes = [],

content =

[#xmlText{

parents = [{parents,2},{family,1}],

pos = 1,language = [],value = "

",type = text},

#xmlElement{

name = person,expanded_name = person,nsinfo = [],

namespace =

#xmlNamespace{default = '

parents = [{...}|...],

pos = 2,...},

#xmlText{

parents = [{parents,2},{family,1}],

pos = 3,language = [],value = "

",type = text},

#xmlElement{

name = person,expanded_name = person,nsinfo = [],

namespace = {...},...},

#xmlText{

parents = [{parents,...},{...}],

pos = 5,language = [],...}],

language = [],xmlbase = ".",elementdef = undeclared},

#xmlText{

parents = [{family,1}],

pos = 3,language = [],value = "

",type = text},

#xmlElement{

name = children,expanded_name = children,nsinfo = [],

namespace =

#xmlNamespace{

default = '

parents = [{family,1}],

pos = 4,attributes = [],

content =

[#xmlText{

parents = [{children,4},{family,1}],

pos = 1,language = [],value = "

",type = text},

#xmlElement{

name = person,expanded_name = person,nsinfo = [],

namespace = {...},...},

#xmlText{

parents = [{children,...},{...}],

pos = 3,language = [],...},

#xmlElement{name = person,expanded_name = person,...},

#xmlText{parents = [...],...}],

language = [],xmlbase = ".",elementdef = undeclared},

#xmlText{

parents = [{family,1}],

pos = 5,language = [],value = "

",type = text}],

language = [],xmlbase = ".",elementdef = undeclared} 2> rr(code:lib_dir(xmerl) ++ "/include/xmerl.hrl").[xmerl_event,xmerl_fun_states,xmerl_scanner,xmlAttribute,xmlComment,xmlContext,xmlDecl,xmlDocument,xmlElement,xmlNamespace,xmlNode,xmlNsNode,xmlObj,xmlPI,xmlText]3> {Element, _} = xmerl_scan:file("family.xml"), Element.#xmlElement{name = family,expanded_name = family,nsinfo = [],namespace =#xmlNamespace{default = ' http://world.com/family',nodes = []},parents = [],pos = 1,attributes =[#xmlAttribute{name = xmlns,expanded_name = [],nsinfo = [],namespace = [],parents = [{family,1}],pos = 1,language = [],value = " http://world.com/family ",normalized = false}],content =[#xmlText{parents = [{family,1}],pos = 1,language = [],value = "

",type = text},#xmlElement{name = parents,expanded_name = parents,nsinfo = [],namespace =#xmlNamespace{default = ' http://world.com/family',nodes = []},parents = [{family,1}],pos = 2,attributes = [],content =[#xmlText{parents = [{parents,2},{family,1}],pos = 1,language = [],value = "

",type = text},#xmlElement{name = person,expanded_name = person,nsinfo = [],namespace =#xmlNamespace{default = ' http://world.com/family' ,...},parents = [{...}|...],pos = 2,...},#xmlText{parents = [{parents,2},{family,1}],pos = 3,language = [],value = "

",type = text},#xmlElement{name = person,expanded_name = person,nsinfo = [],namespace = {...},...},#xmlText{parents = [{parents,...},{...}],pos = 5,language = [],...}],language = [],xmlbase = ".",elementdef = undeclared},#xmlText{parents = [{family,1}],pos = 3,language = [],value = "

",type = text},#xmlElement{name = children,expanded_name = children,nsinfo = [],namespace =#xmlNamespace{default = ' http://world.com/family',nodes = []},parents = [{family,1}],pos = 4,attributes = [],content =[#xmlText{parents = [{children,4},{family,1}],pos = 1,language = [],value = "

",type = text},#xmlElement{name = person,expanded_name = person,nsinfo = [],namespace = {...},...},#xmlText{parents = [{children,...},{...}],pos = 3,language = [],...},#xmlElement{name = person,expanded_name = person,...},#xmlText{parents = [...],...}],language = [],xmlbase = ".",elementdef = undeclared},#xmlText{parents = [{family,1}],pos = 5,language = [],value = "

",type = text}],language = [],xmlbase = ".",elementdef = undeclared}

Well… That’s clearer but it’s still a lot of information. And I hear you, Erlang masters: We’re not supposed to inspect those records visually. We should use the functions provided in the xmerl modules to walk through them. The extra info provided by the #xml… records is there precisely for that.

Nevertheless, more often than not, I find myself wanting a simpler representation of the XML, with just the basic data that I need, if possible in tuple format.