In an earlier post I promised that I would come back to the topic of compressing JavaScript programs efficiently, so here we go…

In this post, I’ll go through some techniques for creating self extracting, compressed JavaScript programs.

Since this is quite a lengthy post, here’s an outline as an overview and for quick navigation:

Why compress JavaScript?

There may be several legitimate reasons for compressing your JavaScript files. The most obvious is probably to reduce network bandwidth, which can be achieved with HTTP compression (i.e. by using gzip compression between the server and the client at the protocol level).

There are, however, some less common cases where you can not, or are not allowed to, rely on HTTP compression. The use case that originally triggered my interest is the somewhat artificial JavaScript file size limitation of certain demo contests. For instance, the WebGL 64k contest has the following rules:

Submission file size MUST be ≤ 64k (65,536 bytes)

No external requests, everything must be inlined in the JavaScript

Pay close attention to the second point: all assets, such as images and music, must be inlined in the JavaScript source code, and according to the first point, the source code must not exceed 65536 bytes. Those are quite challenging limits!

In order to make an interesting program, you will certainly want make good use of those bytes, and that calls for compression.

The Google Closure Compiler

First things first: There is a great tool from Google called the Google Closure Compiler. The closure compiler will basically trim down your JavaScript code to a minimum by:

Removing all comments and unnecessary white-spaces.

Renaming variables and properties to shorter names.

Refactoring expressions and constructs into a more compact form.

Needless to say, any attempts at compressing JavaScript source code should always start by using the closure compiler.

With that said, this article will focus mainly on taking compression even further by applying binary compression methods to the already compact code produced by the closure compiler.



Selecting a compression method

One of the main problems that we face when we want to compress a JavaScript program is that it must be able to decompress itself, which means that the decompression code must not take too much space, or the gain from compressing the original program would be lost!

Another problem is that the binary compressed data stream must be stored in the JavaScript source somehow (more on that later).



Existing libraries

Since the decompression routine must be small, existing (efficient but large) decompression libraries such as JXGraph and js-deflate are disqualified (the decompression routine from the latter, which seems quite minimal, weighs in at 4924 bytes when run through the Google Closure Compiler, which is a bit on the heavy side).



Using liblzg

My first solution to the problem was to use the liblzg compression library, which was specifically designed to enable a light weight decompression routine. As it turns out, the decompression routine fits into about 450 bytes of JavaScript code (with a potential of becoming even smaller with some tweaking), which is clearly acceptable for our needs.



Handling binary data

This brings us to another problem: how to store binary data in a JavaScript source file.



Base64

The tried and true method for storing binary data in text files is of course base64. However, since that scheme is only able to store six data bits per text character, we get a data growth of 33% (assuming an 8-bit character encoding)! That data growth will severely damage the compression ratio, so we ought to find a better method.



Latin 1

The first method that I tried was to use plain Latin 1 encoding – i.e. just put the compressed byte data into a JavaScript string in a Latin 1 encoded JavaScript source file. Obviously, that will not work without some tweaks, since there are several byte values that are forbidden in such a string (0-31, 39, 92, and 127-159, to be more precise).

By substituting invalid byte codes with two valid characters, and some “clever” shifting of codes to minimize the occurrences of invalid codes (based on the statistical content of liblzg compressed data), the data growth factor could be reduced to about 5-10%.

Surely, we can do better than that?



UTF-16

Yes we can! Please note that while in Latin 1 encoding there are 26% invalid code points (in the 0-255 range), UTF-16 only has about 3% invalid code points (in the entire 0-65535 range, only 2151 code points must be avoided).

So, let us use UTF-16 coding for our packed JavaScript program! A 2-byte BOM in the beginning of the file will ensure that the browser understands that we use UTF-16 (it’s actually even safer than Latin 1 encoding, which can be misinterpreted as UTF-8 by some clients or text-editors).

Now we can just pack two bytes from the compressed data stream into a single UTF-16 character, and again have a fall back for invalid code points (use two valid UTF-16 characters to encode two bytes that would otherwise make up an invalid code), giving at most a data growth of less than 4% (in the case of liblzg compressed data, the figure is usually less than 1%).



The downside to UTF-16 encoding

There is one severe downside to using UTF-16 encoding: the decompression routine must be encoded in UTF-16 too, which means that it will take twice the space (i.e. over 900 bytes)! For small files, that may even make the final file larger than with Latin 1 encoding.

But fear not, the solution is near: we can pack the decompression routine!

How? Well, let’s use the same trick as for the binary data: pack two characters into one UTF-16 character. The decompression routine has a nice property: it’s pure ASCII (only using codes in the range 32-126). This means that combining two consecutive bytes into a single Unicode character will always result in a valid code point (all codes in the range 2020-7e7e are valid UTF-16 codes).

Since packing the decompression routine in a UTF-16 string will never produce invalid codes, we get 0% space loss, and the routine for unpacking the packed string is very simple. Yes, we need an additional unpacking routine, and it has to be in plain (unpacked) UTF-16 form, but it does not add much to the code since it’s very basic. It looks something like this (before mangling it to a more compact form):

var i, c, p = '...packed-string...', s = ''; for (i = 0; i < p.length; i++) { c = p[i]; s += String.fromCharCode(c >> 8, c & 255); } eval(s);

Conclusions for the liblzg approach

So, at this point, we’ve taken the liblzg compression pretty much as far as we can for making a self extracting JavaScript program. In summary:

A self-extracting JavaScript module (about 600 bytes).

Fairly well-packed binary data (only about 3% larger than the original binary size).

Pure ECMAScript implementation (very high cross-browser compatibility).

Decent compression ratio (though not quite as good as DEFLATE, Bzip2, LZMA, etc).

While we have achieved great results, we can do even better!



Doing even better – PNG

Wouldn’t it be great if we could access some browser API that can decompress DEFLATE encoded data from JavaScript? Most browsers use zlib internally for many different things, but still there is no direct support for zlib from JavaScript…

However, if we are willing to sacrifice compatibility with legacy browsers (e.g. IE 6, 7 and 8), there is a cool trick that we can use (that relies heavily on the <canvas> element): store our JavaScript program in a PNG image file (let each character in the source code be one pixel in the image, ranging from 0-255 in gray scale intensity).

So what good would that do – and how can we use it?



The PNG idea

OK, so a little background on the PNG image file format:

PNG uses the DEFLATE algorithm for compression (which is the same that is used in ZIP and gzip, for instance).

The file format is lossless (no pixel values are distorted).

All modern browsers can decode PNG images when loaded to an Image object.

We can draw the PNG image to a canvas element and read back pixels using the getImageData() method!

In other words, with the aid of the canvas element, we can decompress a PNG image and extract its contents into a JavaScript String object, that we can then execute (e.g. with eval).

Again, we can use the UTF-16 “trick” for storing the raw PNG image data in a JavaScript string without experiencing too much data growth.



The decompression logic

As with the liblzg solution, we need some decompression code. Only this time it’s not the actual decompression algorithm, but a small program that does roughly the following:

Decode the UTF-16 string to a binary “string” (containing the raw PNG data). Encode the binary string as Base64, using the btoa() method. Generate a data URI by prepending a small header: “data:image/png;base64,” Load the data URI into an Image object. In the onload handler of the image: Draw the image to a canvas element. Read back the pixel values and put them into a new string. Execute the content of the string.

The size of this decompression program is roughly the same as the liblzg decoder, but on the other hand the DEFLATE compression method gives us better compression ratios than liblzg (the main reason is that while liblzg only uses LZ77 compression, DEFLATE uses a combination of LZ77 and Huffman compression).



Conclusions for the PNG approach

At this point, and with the given goal (making a fairly small JavaScript source file even smaller), the PNG method described above seems to be as far as you can get with current browser technology when it comes to compression ratio (when including the decompression routine as part of the compressed deliverable).

The key benefits with this method are:

Reuse the DEFLATE decompression routine in browser: Commonly available (every browser that supports PNG). Quite good compression ratios (JavaScript code compresses well).

Encode the raw PNG data in a valid UTF-16 string: Very low overhead for encoding binary data in a string (only 3-4%, compared to 33% for Base64). Very little chance for the data to be misinterpreted by the browser.



Wrapping it up – CrunchMe

I hope that you found the compression methods described here useful and/or interesting.

If you want to compress your own JavaScript programs, there is an open source command line tool available called CrunchMe.