Tagged:

Now in the fourth edition of version 1.0 or second edition of 1.1, XML has enjoyed a popularity matched by few other technologies. Introduced in 1998 as a more general-purpose (and extensible) markup language than HTML (and also derived from SGML), XML has spawned a host of other related technologies (XPath, XSL/XSLT, XQuery, XML Schema, Relax NG, etc.) as well as a plethora of XML-based dialect languages covering every conceivable purpose.

In the world of enterprise programming (most notably Java), XML extended its reach to become the data/configuration/metadata format of choice. At one point in time, any software framework that even had a vaguely enterprise-y smell to it relied on XML almost as a matter of course: J2EE (EJB, Servlet API, JSF, etc.), Struts, Spring, Tapestry, and the list goes on. When the ability to communicate via HTTP between browser requests was popularized in 2005 (famously by Jesse James Garrett), XML was so prevalent that it was simply assumed that this was the data serialization format of choice--hence the term AJAX (Asynchronous JavaScript and XML) or the "XmlHttpRequest" object. Use of XML was simply unquestioned.

Then something happened, or started to happen...In the last few years other technologies have begun to encroach in some of the areas where XML was once so dominant: annotations in Java (attributes in C#), JSON , Protocol Buffers, and even YAML. Dissatisfaction with XML is on the rise. Could it be that developers are realizing that XML is not good for everything? The evidence is growing:

Much of the motivation behind Google Guice was to create as a "pure java"--i.e., sans XML--implementation of a dependency injection framework. One of the creators of Guice, "Crazy" Bob Lee, has made no bones about his disdain for XML as a framework design tool.

Spring itself followed suite, introducing a pure-annotation approach to dependency injection (in addition to the XML-based approach) in Spring 2.5.

Wicket advertises itself as having a "refreshing lack of XML".

JSON's compactness and ease of serializing/deserializing to and from JavaScript has made it a very appealing alternative to use of XML, and has taken a big bite out of the X in AJAX.



Numerous official Java specifications (EJB 3.0, JPA, JSF 2.0, Servlet 3.0) are moving away from the use of XML metadata and towards Java annotations. This is really one of the largest pieces of damning evidence, as Java specifications were one of the major drivers behind the canonization of XML as a key enterprise technology.

Some people have even gone as far as to dedicate web sites explaining why XML sucks.

Of course, as one may point out, this could just be the "vocal minority" voicing objections while the quieter majority continue to use it--and indeed the use of XML and the development of XML-related technology shows no sign of really slowing down. But this trend does raise the legitimate question of whether or not XML has really over-reached its original purpose (and usefulness), and needs to be re-evaluated for some of the use cases to which it is currently being applied. So, now that XML is in the denouement of its hype cycle, it is a good candidate for a more honest evaluation of its strengths and weaknesses.

Let's start with the minuses...

Drawbacks of XML

Verbose: By its very nature as a markup language, XML contains considerable redundancy (i.e., <tag></tag>). While this suits the hierarchical structure of markup languages well, it can be a big drawback, especially when dealing with large amounts of data. This verbosity carries real consequences in terms of processing efficiency and network transmission overhead. Technologies like MTOM or XOP are really a hack to get around this problem.

Trees: The hierarchical tree structure of XML is a generally useful structure, but is not naturally suited to every problem domain. Some types of data are simply better suited to other data structures: lists, maps, etc. Unnecessary representation of this data as a tree carries some consequences in terms of processing efficiency and complexity.

Markup Language: XML is a markup language, not an imperative or functional language. And it is not good at faking either one. This seems to be a fundamental point missed by some fairly knowledgeable people. The otherwise well-designed BPEL is a case in point: right ideas, wrong technology. This doesn't mean that XML can't be used as a kind of "Poor Man's DSL", but being declarative is about as far as one should stretch a markup language.

Language Metadata: Though specifications like XML Schema brought a kind of type system to XML, this was a type system meant to be language agnostic. Historically speaking, however, it is common to see XML applied as a tool for language metadata, forcing the tedious and non-typesafe use of references to language types. This is the classic XML attribute class="com.example.Foo" seen in way too many Java enterprise frameworks. A real facility for language metadata (annotations in Java or attributes in C#) is a much better solution.

Nothing world-shaking here. Most developers having to type out XML documents have probably thought of these at some point in time. So what are the good points?

Advantages of XML

Platform and Language Neutral: Although other competing technologies can make the same claim, this is one of the big reasons for the rise in XML's popularity in the first place.

Great tools: There are a very rich set of tools for working with XML, which is certainly one of the reasons for its great popularity. This makes working with XML a much simpler choice, since in most languages the parser and other tools have already been written for you.

Readability: Some people may argue with this and provide good counter-examples (EJB deployment descriptor files come to mind), but in general 90+% of XML documents I've ever seen are fairly readable. This readability, however, certainly does not scale: larger, more complex XML documents tend to be fairly unreadable, but this is often more of a consequence of the misapplication of the technology.

Namespaces: Although using different namespaces in an XML document can have some unexpected surprises for the beginner, generally speaking namespaces are a pretty powerful feature of XML. They enable, among other things, ideas like "mashups"--i.e., XML documents being extended or combined with other XML documents (or content) in ways not necessarily foreseen by the providers of those documents (think Yahoo Pipes). Being able to avoid conflicts between different data sources is one of XML's great advantages over other technologies that do not support namespaces.

Validation: Built-in data validation is another one of XML's advantages. However one may feel about the widely-used standard, XML Schema, having the heavy lifting of this tedious functionality off-loaded from the author to the tools is truly a blessing.

Using simple math, it would seem that the advantages outweigh the disadvantages. But of course, it isn't that simple and the benefits vs. drawbacks have to be weighed on an individual basis. Most of the abuses of any given technology usually stem from the case that simple facts like this get overlooked or forgotten amidst the hype. In the worst cases, this kind of thinking results in elaborate specifications that are designed simply as workarounds to the limitations of the technology. The "right tool for the right job" is the caveat here, but the warning seems easily forgotten.

The popularity of XML is definitely here to stay and the technology is generally "good enough" for most purposes to which it has been applied. But I think that it is important, as with any technology, to apply some critical thinking before using it for a given purpose. If you find yourself in the middle of coding an elaborate workaround to a problem you are encountering (performance or otherwise), the question "Why am I doing this?" should be more than a passing thought. A little (un?)common sense goes a long way in building the right solution.