One of the great advantages of the OpenDocument format is that it is simply a zip file. You can unzip it with any archiver and take a look at the contents, which is a set of XML documents and associated data. Many people are using this feature do create some nifty toolchains. Unzip, make some changes, zip it again and you have a new ODF document. Well… almost.

The OpenDocument Format specification, section 17.4 has one little extra restriction when it comes to zip containers: The file called “mimetype” must be at the beginning of the zip file, it must be uncompressed and it must be stored without any additional file attributes. Unfortunately many developers seem to forget this. It is the number one cause of failed documents at Officeshots.org. If the mimetype file is not correctly zipped then it is not possible to programmatically detect the mimetype of the ODF file. And if the mimetype check fails, Officeshots (and possibly other applications) will refuse the document. This problem is compounded because virtually no ODF validator checks the zip container. They only check the contents.

In this article I will show you how you can properly zip your ODF files, but before I do that I will show you the problem in detail.

Detecting mimetypes

Linux and other Unix-like opratingsystems do not rely on file extensions to determine the type of a file. Relying on file extensions can be a serious sercurity problem, as you can see in the Windows world. It's simply too easy to change the extension and pretend that a file is of a different type than it really is. Instead, the Unix world looks at the contents of the file itself. This happens with a library called “magic”.

The magic library consists of a large set of rules, which it uses to figure out what type of file it is looking at. For example, it can look at a certain byte offset and see what value it contains. This is precisely the reason why the ODF specification says that you need to zip the mimetype first, without any file attributes. If you do that and open the ODF file in a hex editor, you will see something like this:

Offset: Hexadecimal: ASCII: 00000000 - 50 4b 03 04 14 00 00 08 00 00 c1 b6 66 3b 5e c6 PK.............. 00000010 - 32 0c 27 00 00 00 27 00 00 00 08 00 00 00 6d 69 2.'...'.......mi 00000020 - 6d 65 74 79 70 65 61 70 70 6c 69 63 61 74 69 6f metypeapplicatio 00000030 - 6e 2f 76 6e 64 2e 6f 61 73 69 73 2e 6f 70 65 6e n/vnd.oasis.open 00000040 - 64 6f 63 75 6d 65 6e 74 2e 74 65 78 74 50 4b 03 document.textPK. ...

This is very easy to match for the magic library. Here is an explanation of the rules that magic uses to test if the file is an ODF file:

Look at the beginning of the file. It should start with the letters PK and then bytes 03 and 04. This means it is a zip file. Look at offset 30 ("1e" in hex). It should be the string "mimetype". Look at offset 38 ("26" in hex), directly after the word "mimetype". It should be one of the ODF mimetypes.

You can guess what happens when you don't zip the mimetype file first: The string "mimetype" won't be at the right offset. And if you accidentally zip it with extra file attributes, then the contents of the mimetype file will not start directly after it. There will be several bytes in between. This causes the magic library to detect it as a standard zip file, not as an ODF file. Here is how such a badly zipped ODF could look like. This file was zipped normally, without paying special attention to the mimetype file:

Offset: Hexadecimal: ASCII: 00000000 - 50 4b 03 04 0a 00 00 00 00 00 25 01 6e 3c 00 00 PK.............. 00000010 - 00 00 00 00 00 00 00 00 00 00 10 00 15 00 43 6f ..............Co 00000020 - 6e 66 69 67 75 72 61 74 69 6f 6e 73 32 2f 55 54 nfigurations2/UT 00000030 - 09 00 03 16 1b 9c 4b 47 1e 9c 4b 55 78 04 00 e8 ......KG..KUx... 00000040 - 03 e8 03 50 4b 03 04 0a 00 00 00 00 00 25 01 6e ...PK........%.n ...

As you can see, it does not match the rules that the magic library has. Instead of checking your ODF file with a hex editor, you can also simply use the "file" command. For example:

$ file --mime my-document.odt my-document.odt: application/vnd.oasis.opendocument.text

If that command results in "application/zip" or "application/octet-stream" then it means that your ODF file is probably incorrectly zipped. Note that the magic library shipped with "file" up to version 5.0.3 does not contain all mimetypes for ODF files but only for OpenDocument Text (odt) files. File 5.0.3 is the version most commenly shipped with Linux distributions today. I have since submitted a patch that includes all known ODF mimetypes. It was accepted and it should be included in file version 5.0.4 and later.

How to zip an ODF file

So, here is how you can zip an ODF file the right way. Suppose that I have an unzipped ODF file that looks like this:

+ my-document/ + Configurations2/ + META-INF/ - manifest.xml + Thumbnails/ - thumbnail.png - content.xml - meta.xml - mimetype - settings.xml - styles.xml

Start by creating a new zip file that just contains the mimetype file:

$ zip -0 -X ../my-document.odt mimetype

The -0 parameter means that the file will not be compressed. The -X parameter means that no extra file attributes will be stored. Next you can add the rest of the files:

$ zip -r ../my-document.odt * -x mimetype

Be sure to exclude the mimetype file. Now if you look at it with a hex editor, you will see it has been zipped correctly:

Offset: Hexadecimal: ASCII: 00000000 - 50 4b 03 04 14 00 00 08 00 00 c1 b6 66 3b 5e c6 PK.............. 00000010 - 32 0c 27 00 00 00 27 00 00 00 08 00 00 00 6d 69 2.'...'.......mi 00000020 - 6d 65 74 79 70 65 61 70 70 6c 69 63 61 74 69 6f metypeapplicatio 00000030 - 6e 2f 76 6e 64 2e 6f 61 73 69 73 2e 6f 70 65 6e n/vnd.oasis.open 00000040 - 64 6f 63 75 6d 65 6e 74 2e 74 65 78 74 50 4b 03 document.textPK. ...

Happy zipping everyone!