Douglas Crockford left an excellent comment on my recent posting All markup ends up looking like XML, which he later made into its own blog posting, For the trees. I agree with his reworking of the structure: given the data that I provided, the JSON, LISP, and XML markup all could have been simpler.

If he’s right about the examples, though, he’s wrong about two things. First, my posting doesn’t represent any kind of softening to JSON among its opponents in the XML community, simply because I’ve never been one of those opponents. Second, I spend at least one order of magnitude more time working with SQL and programming languages (not processing XML) than I do with XML, so if anything, my perspective on XML would likely be tainted by them rather than the other way around. Instead, I think the examples were complicated because I built for tomorrow instead of today.

Tomorrow

So what might tomorrow look like for an application dealing with names? Consider, for example, this XML markup, moving gender out of the element/property name as Doug suggests, and eliminating the other attributes (since they don’t add much to the discussion):

<names> <name gender="male"><surname>Saddam</surname> Hussein</name> <name gender="female">Susan B. <surname>Anthony</surname></name> <name gender="male">Al <surname>Unser</surname> Jr.</name>

<name gender="male">Don Alonso <surname>Quixote</surname> de la Mancha</name> </names>

It’s surprisingly messy breaking each name down into a simple property list. If we tried the approach Doug used for my simpler examples, we’d end up with this (note that this is a list of names, not of people):

{"names": [ {"gender": "male", "given-name": "Hussein", "surname": "Saddam"}, {"gender": "female", "given-name": "Susan B.", "surname": "Anthony"}, {"gender": "male", "given-name": "Al Jr.", "surname": "Unser"} {"gender": "male", "given-name": "Don Alonso Quixote de la", "surname": "Mancha"} ]}

This list needs a bit of patching. First, if we reconstruct the names as strings, we don’t want to end up with “Hussein Saddam” instead of “Saddam Hussein”, so we’ll have to add a property specifying whether the surname comes first or last:

{"gender": "male", "given-name": "Hussein", "surname": "Saddam", "surname-after-given-name": false}

Great — that’s all we need to fix that, and now we know to print “Saddam Hussein”. Now, let’s look at Susan — there’s no problem recreating the string “Susan B. Anthony” from these properties, but we probably should rename the property given-name to given-names , just to avoid confusion:

{"gender": "female", "given-names": "Susan B.", "surname": "Anthony", "surname-after-given-names": true}

Al Unser Jr. is a bit trickier, because there was no obvious place to put the “Jr.”. Strictly speaking, it’s neither a given name nor a surname, so for now, let’s just call it a postfix (although that assumes a physical position that might not apply to all languages):

{"gender": "male", "given-names": "Al", "surname": "Unser", "surname-after-given-names": true, "postfix": "Jr."}

Don Quixote, however, forces us to reconsider some of our assumptions, because “Don” is not a given name but an honorific. Assuming, however, that we don’t care whether it’s a name or an honorific, lets just call it prefix for now, to go with postfix :

{"gender": "male", "prefix": "Don", given-name: "Alonso", "surname": "Quixote", "surname-after-given-names": true, "postfix": "de la Mancha"}

Finally, just to throw a wrench into things, let’s assume that our list might contain things other than names, so that we need to add a type property:

{"type": "name", "gender": "male", "prefix": "Don", "given-name": "Alonso", "surname": "Quixote", "surname-after-given-names": true, "postfix": "de la Mancha"}

Granted, that sort-of works, but it’s really not very nice, and it’s extremely brittle: there are names with extra words in the middle (such as “de”) that are properly not part of the given name or surnames, for example. Then again, why overtag it? Perhaps we don’t need to know what’s a given name or honorific, as long as we can distinguish the surname. One possibility is simple to break it down to four properties:

{"type": "name", "gender": "male", "presurname": "Don Alonso", "surname": "Quixote", "postsurname": "de la Mancha"}

While I’m a big fan of Agile development in principle, however, I’ve worked on enough broken legacy systems to leave a little wiggle room for future requirements, like, say, a need to isolate the primary given name for a mail merge or index, even if we’re not going to isolate it right now. Fortunately JSON, like XML, has a natural ability to represent ordered information much more elegantly — let’s make the name into an ordered array:

{"type": "name", "gender": "male", "value:" ["Don Alonso", {"type": "surname", "value": "Quixote}, "de la Mancha"]}

This approach provides us with almost limitless flexibility (for example, if we start isolating honorifics, we can deal with a language where the honorific comes at the end of the name with no extra trouble), and is just as simple and easy to read as the much less flexible presurname / postsurname approach. Building for today is great, but if you have a choice between two roughly equivalent approaches where one provides an easy future upgrade path and the other doesn’t, which is the best choice? JSON is new enough that the JSON community hasn’t yet had to deal much with the life cycle of information — once enough people have built apps relying on specific JSON formats, it will be very, very hard to make any changes: v.2 of any popular data format generally results in enormous costs (in money and goodwill), and v.3 rarely happens.

Some people might prefer to shorten the above example a bit by following a simple convention: the first member of each array is a label, the second is a map with properties describing the rest of the array, and the remainder is the value, where order may be significant:

["name", {"gender": "male"}, "Don Alonso", ["surname", {}, "Quixote"], "de la Mancha"]

That is trickier to dump straight into a data structure or database table, but it’s a much more natural way to represent the information, and a lot easier to read on the screen. And just in case it doesn’t look look familiar, compare:

<name gender="male">Don Alonso <surname>Quixote</surname> de la Mancha</name>

If your information isn’t this complicated, JSON, XML, or LISP can be simple, as Doug pointed out — the XML could just as easily be

<name gender="male" presurname="Don Alonso" surname="Quixote" postsurname="de la Mancha"/>