This post is a brief but practical demonstration of validating an XML Document against Document Type Definition (DTD) and XML Schema Definition (XSD) files. Such validation is necessary to ensure that XML sent between both the client, and the server hosting an XML based Web Service, is received as expected. Both DTD and XSD allow you to define the elements and attributes, an XML document should contain. XSD has the advantage of being written in XML itself, making it easier to read. XSD also allows more information to be defined about an element — such as their data type, namespace and restrictions for values. There are a few modules for validating XML in Perl 5 but I will be using two from the Lib::XML namespace.

Project Setup

Project Files

I’ll be using the following files contained in a directory named perl5-xml-validation to demonstrate both DTD and XSD validation.

perl5-xml-validation/

.

├── bin

│ ├── dtd_validation.pl

│ └── xsd_validation.pl

├── cpanfile

└── xml

├── book.xml

└── schemas

├── book.dtd

└── book.xsd

Once you’ve created the directory, be sure to enter it using cd perl5-xml-validation .

Dependencies

Open the file cpanfile and list the following modules:

requires 'XML::LibXML';

requires 'Try::Tiny';

Run cpanm -L local --installdeps . to install the modules. I will explain the other files as we start using them.

The XML

The XML document used for this demonstration simply contains book details within the file ./xml/book.xml as follows:

<?xml version="1.0" encoding="utf-8" ?>

<book>

<title>XML Validation</title>

<numPages>-200</numPages>

</book>

The value ‘-200’ for the numPages element has been set deliberately. The reason should be clear later.

Validating the XML with DTD

The DTD File

The file ./xml/schemas/book.dtd contains the following definitions:

<!ELEMENT book (title,author,numPages?)>

<!ELEMENT title (#PCDATA)>

<!ELEMENT author (#PCDATA)>

<!ELEMENT numPages (#PCDATA)>

For any XML document associated with this DTD to be valid, it must contain the elements book, title, author, and optionally numPages. As you will notice, the author element is missing from the XML we defined earlier — we’ll correct this later.

Now let’s validate the XML. The file ./bin/dtd_validation.pl contains the following code, which I explain below it.

The Code

#!/usr/bin/env perl use v5.18;

use warnings;



use XML::LibXML;

use Try::Tiny qw(try catch); my $xml_doc = XML::LibXML->load_xml(location => './xml/book.xml');

my $dtd_doc = XML::LibXML::Dtd->new('', './xml/schemas/book.dtd'); my $is_xml_valid = try {

$xml_doc->validate($dtd_doc)

}

catch {

say '==> ' . $_;

return 0;

};

say $is_xml_valid ? 'Valid' : 'Invalid';

Regarding validation, the code above does the following:

Loads the XML file using load_xml , which returns a XML::LibXML::Document object which is assigned to $xml_doc .

, which returns a object which is assigned to . Then the DTD file book.dtd is loaded as part of XML::LibXML::Dtd instance construction.

is loaded as part of XML::LibXML::Dtd instance construction. Finally, the validation itself is done using the validate method.

So that the ternary expression to the bottom of the code is executed, we use the functions try and catch which are exported by Try::Tiny. These two functions will handle any errors thrown by the validate method. Not handling any errors which occur during the call to validate will cause the script to halt execution, so the ternary operation would not be executed.

Validation Errors

Now if you run perl ./bin/dtd_validation.pl and have not corrected the XML prior to this point, you will notice similar output in your shell as seen below:

==> ./xml/book.xml:0: validity error : Element book content does not follow the DTD, expecting (title , author , numPages?), got (title numPages ) Invalid

The output message is indicating that the book element’s children are not as expected. The validation expected title , author and optionally numPages (in that order) but instead it just got title and numPages .

Fixing the Error