Three little fish...

Disclaimer: Work on this in the Norwegian government has been going on for years. I worked on this for four months, producing a 45-page report. This blog posting oversimplifies most of the way through in the interests of brevity. The full report is here, and if you can read Norwegian you can post your feedback in the form on that page.

Ever since ODF and OOXML burst onto the scene in ISO SC34 I've tried to avoid getting pulled into the mess. I was quite successful at this for several years, until one day one our managers at Bouvet suggested we bid for a contract to write a report for the Norwegian government (strictly speaking, the Agency for Public Management and eGovernment (Difi)). The report was about whether to recommend/require ODF and/or OOXML in the Norwegian public sector. I couldn't come up with any valid excuses for not doing it, and so we sent in a bid, and in the end won the contract.

The context of the report is that in Norway the government issues a reference catalogue listing the standards that the public sector is required or recommended to use within various usage areas. Earlier versions of the catalogue required use of ODF in two usage areas, and listed OOXML as being "under observation". So for publication of editable documents on public web sites ODF has been the required format in Norway for a while now. (Note the word "editable"; documents which are not meant to be edited by the recipient must be published in either HTML or PDF.)

My task was to take into account recent developments in the field and make recommendations for how the report should be updated with regards to four specific areas of usage. Basically, this meant following up the "under observation" part. Should the catalogue also make OOXML required or recommended for some of these areas? Or something else entirely?

Method

Approaching a task like this was not easy. What recommendations would make sense? And how to justify them? What does the Norwegian public sector actually need? That last question gave me a place to start. If I could put together some use cases that should show me what functionality users would need. I could then check the description of that functionality in the standards, and also do some testing to see if interchange of documents using this functionality would work in practice.

So this is what I did. I came up with a small set of use cases for each of the usage areas. Very briefly, it goes like this:

Web publishing #1: Forms (fill out, send in electronically) #2: Templates (proposed templates for various kinds of documents) #3: Contracts (proposed standard contracts, to be edited)

Attachments to emails from public sector to private #4: Forms (receive via email, this time) #5: Contract writing (with a private-sector supplier, for example)

Attachments to emails within public sector #6: Collaborative authoring #7: Interchange of budget data



The list was produced through interviews with colleagues and various representatives from the public sector. I realize the list is very short, but remember that most documents are not meant to be edited by the recipient, and for these documents the public sector is required to use HTML or PDF. Note also that, as you'll see, adding more use cases is very unlikely to change the final conclusion.

From these scenarios I then drew up a short list of the necessary functionality:

Basic formatting (paragraphs, lists, tables, etc; all use cases)

Change tracking (#5 and #6)

Comments (#5 and #6)

Spreadsheets with formulas (#7)

Spreadsheets with macros (#7)

Forms (protected against editing with a password; #1, #2, and #4)

Sunset on Canary Wharf

The specs themselves

Now, I was asked to consider two specifications only: ECMA-376:2006, which is the very first OOXML standard (not the one later published by ISO), and ODF 1.1. Together these two documents run to 6783 pages, which was a bit much for me to digest and consider in the limited number of hours I had at my disposal. I therefore decided to focus on the specification of the specific functionalities in the list above (except the first one), and to look for general reports of problems in the two specifications to get a feel for the quality of each.

For ODF 1.1 the results were basically as follows:

General quality Lots of errors, and quite a few holes where things basically are not specified at all. The mistakes I found were mostly minor (that is, very limited in scope). Change tracking The handling of change tracking in running text is quite fair, but doesn't seem to be complete. Change tracking in tables, lists, formulas, etc is missing. Comments Looks perfectly fine to me. I'm not sure about the parts that describe how comments are placed, but then no-one seems to implement that, anyway, and positioning of comments is not that important. Spreadsheets with formulas This was my first surprise. The specification of formulas just isn't there. Section 8.1.3 of the spec discusses formulas, but is very vague. There's no formal grammar, no list of functions, no list of datatypes, and no evaluation model. Basically, it says formulas should start with "a namespace prefix", then "=", and has some informal prose on how to refer to cells and ranges. That's all. Spreadsheets with macros This didn't really come as a surprise: no macro language or API for macros is defined. There's a defined place to put the macros, an attribute for saying what language you used, and various bits of documents have places where you can put event handlers, but that's all. Forms There's a fairly big and detailed section on forms with various types of controls and so on. To my surprise, there are even mechanisms for connecting this to databases (not relevant for our purposes, but interesting, anyway). There's also a mechanism for making a section of a document read-only, and to do it you put a hash of the password into an attribute. Unfortunately, nothing is said about how to produce the hash, which rather reduces the value of the mechanism.

For OOXML the results were like this:

General quality As everyone knows there's lots of errors and mistakes in the ECMA-376:2006 specification. Even the RELAX-NG schemas that come with it turned out to have syntax errors in them. Change tracking OOXML spends 120 pages on this, a lot of them duplicated. The functionality is very detailed, going into table changes, formatting changes, list numbering changes, etc etc. I couldn't digest it all, but as far as I could tell it was solid. Comments Perfectly fine. Spreadsheets with formulas People have made much of the date problem (no pre-1900 dates, 1900 is incorrectly specified as a leap year), but this part of the spec is mostly quite solid, and the date problem does not appear to be very relevant for the public sector. There is a formal grammar, datatypes, function definitions, etc etc. Yes, there are errors and so on, but at least it's specified in full detail. Spreadsheets with macros Essentially the same as for ODF: not specified. Forms This is described in extensive detail in the spec. The XML modelling of forms looks like it's a direct translation from the original binary format (which it probably is), so it's not exactly beautiful, but as far as I can tell everything you need is there and fully specified. The read-only protection mechanism is fairly complicated (because it's connected with the encryption mechanism), but again looks fully specified.

In short: ODF 1.1 has a huge gaping hole in it as far as spreadsheets are concerned and is full of errors and omissions. ECMA-376 appears to have all the necessary functionality, but is also full of errors. I made no attempt to judge which of the two has the greater density of errors.

Both specifications also have stability issues, although this is worse in the case of OOXML than for ODF.

The implementations

Wikipedia lists a good number of implementations for both formats, so I picked the ones that, as far as I know, have a reasonable set of functionality. Then, for OOXML I considered those which could write OOXML, and for ODF those which could write ODF. Interestingly, there was a reasonable number of each, and for both formats there was a choice of more than 2 implementations on each of the Linux, Mac, and Windows platforms.

In theory it therefore looked like both formats could be used. If, that is, the tools really supported the formats well enough. The only way to verify that was by testing. I made very simple test documents for each of the functionalities listed above in the reference implementation (MS Office for OOXML, OpenOffice for ODF), then opened these in the other tools. If successful, I would make some changes, save to a new file, and open in the reference tool again.

The results were downright depressing. For OOXML, in most cases none of the tools came up with usable results. For change tracking NeoOffice actually worked. And for spreadsheets NeoOffice and Gnumeric both worked fine. (IBM Lotus Symphony and Google Docs also read the spreadsheets correctly, but they can't write OOXML.)

For ODF, in most cases only IBM Lotus Symphony (which is really a fork of OpenOffice) was successful. For comments Microsoft Office (!) and AbiWord also got it right. For spreadsheets the latest Gnumeric for Windows also got it right.

In short, if you want to use ODF or OOXML today, then apparently for OOXML you must use Microsoft Office and for ODF you must use OpenOffice or IBM Lotus Symphony. Or, alternatively, you can use another tool and do lots of manual cleaning up.

I realize that the testing I describe here is very superficial, and I would not on the basis of this testing have made the claim that interchange between tools works. But most of these very, very simple tests failed. My conclusion is that if not even the simplest cases work then real documents are definitely not going to work.

Field, Flåm, Norway

Conclusion

By now I guess the conclusion should be obvious. I couldn't recommend either format. Both specs are of very low quality, and for neither format do you have much of a choice of tools. For the public sector this would essentially mean having to agree not on a format, but on a single tool to be used sector-wide. The purpose of creating standards should be to achieve interoperability, but in this case that just hasn't happened yet.

Having said that, ODF 1.2 looks like it will satisfy nearly all the shortcomings with ODF 1.1 that my report identifies. Similarly, it looks like the next OOXML version (ISO/IEC 29500:2008 amendment 1) will solve most of the OOXML issues. If the implementors follow up and improve their converters things will look much brighter. Unfortunately, this is going to take a couple of years.

So my conclusion in the report is that both standards should be listed as "under observation" for all usage areas.

(Note that this describes version 0.9 of the report. Version 1.0 is due within a month. Feedback over the next week or so is very much welcome.)

Now what?

If previous experience with the OOXML/ODF war is any guide, now follows the part where lots of people get very upset. That's life, I guess.

I went to this job with a genuinely open mind, curious about what I would find, and was really disappointed with the outcome. I knew the specs had problems, but I really thought they were better than this. That the tools were as poor as they are came as an even bigger surprise. In the end, given the results I got I really had no choice about the conclusion.