It seems that every day, some computer monkey comes up with a new and more groovy serialization protocol.

In the beginning, there was ASN.1 and XDR, and it was good. I think ASN.1 came first, and like many old things, it was very efficient. XDR was easier to use. At some point, probably before ASN.1, people noticed you could serialize things using stuff like s-expressions for a human readable JSON like format.

Today, we have an insane profusion of serializers. CORBA (which always sucked), Thrift, protocol buffers, Messagepack, Avro, BSON, BERT, Property Lists, Bencode (Bram … how could you?), Hessian, ICE, Etch, CapnProto (because he didn’t get it right the first time), SNAC, Dbus, MUSCLE, YAML, SXDF, XML-RPC, MIME, FIX, FAST, JSON, serialization in Python, R, PHP, ROOT and Perl… Somehow this is seen as progress.

Like many modern evils, I trace this one to Java and Google. You see, Google needed a serialization protocol across thousands of machines which had versioning. They probably did the obvious thing of tinkering with XDR by sticking a required header on it which allowed for versioning, then noticed that Intel chips are not Big Endian the way Sun chips were, and decided to write their own semi shitty versioning version of XDR … along with their own (unarguably shitty) version of RPC. Everything has been downhill since then. Facebook couldn’t possibly use something written at Google, so they built “Thrift,” which hardly lives up to its name, but at least has a less shitty version of RPC in it. Java monkeys eventually noticed how slow XML was between garbage collects and wrote the slightly less shitty but still completely missing the point Avro. From there, every ambitious and fastidious programmer out there seems to have come up with something which suits their particular use case, but doesn’t really differ much in performance or capabilities from the classics.

The result of all this is that, instead of having a computer ecosystem where anything can talk to anything else, we have a veritable tower of babel where nothing talks to anything else. Imagine if there were 40 competing and completely mutually unintelligible versions of html or text encodings: that’s how I see the state of serialization today. Having all these choices isn’t good for anything: it’s just anarchy. There really should be a one size fits all minimal serialization protocol, just the same way there is a one size fits all network protocol which moves data around the entire internet, and, like UTF-8. You can have two flavors of the same thing: one S-exp like which a human can read, and one which is more efficient. I guess it should be little-endian, since we all live in Intel’s world now, but otherwise, it doesn’t need to do anything but run everywhere.

IMO, this is a social problem, not a computer science problem. The actual problem was solved in the 80s with crap like XDR and S-expressions which provide fast binary and human readable/self describable representations of data. Everything else is just commentary on this, and it only gets written because it’s kind of easy for a guy with a bachelors degree in CS to write one, and more fun to dorks than solving real problems like fixing bugs. Ultimately this profusion creates more problems than creating a new one solves: you have to make the generator/parser work on multiple languages and platforms, and each implementation on each language/platform will be of varying quality.

I’m a huge proponent of XDR, because it’s the first one I used (along with RPC and rpcgen), because it is Unixy, and because most of the important pieces of the internet and unix ecosystem were based on it. A little endian superset of this with a JSON style human semi-readable form, and an optional self-description field, and you’ve solved all possible serialization problems which sane people are confronted with. People can then concentrate on writing correct super-XDR extensions to get all their weird corner cases covered, and I will not be grouchy any more.

It also bugs the hell out of me that people idiotically serialize data when they don’t have to (I’m looking at you, Spark jackanapes), but that’s another rant.

Oh yeah, I do like Messagepack; it’s pretty cool.