Patent 5,533,051 on compression of random data

Patent 5,533,051 "Method for Data Compression" Filed Mar. 12, 1993; granted Jul. 2, 1996 Inventor David C. James; Assignee: The James Group

The first line of the patent abstract is:

Methods for compressing data including methods for compressing highly randomized data are disclosed.

A second aspect of the present invention which further enhances its ability to achieve high compression percentages, is its ability to be applied to data recursively. Specifically, the methods of the present invention are able to make multiple passes over a file, each time further compressing the file. Thus, a series of recursions are repeated until the desired compression level is achieved.

the direct bit encode method of the present invention is effective for reducing an input string by one bit regardless of the bit pattern of the input string.

Assume that the program can compress without loss all files of size N bits or more. Compress with this program all the 2^N files which have exactly N bits. All compressed files have at most N-1 bits, so there are at most (2^N)-1 different compressed files [2^(N-1) files of size N-1, 2^(N-2) of size N-2, and so on, down to 1 file of size 0]. So at least two different input files must compress to the same output file. Hence the compression program cannot be lossless.

If the method were indeed able to shrink any file by at least one bit, applying it recursively would shrink gigabytes down to a few bits.

The patent contains evasive arguments to justify the impossible claims:

Page 12, line 22:

Of course, this does not take into account any overhead registers or other "house-keeping" type information which must be tracked. However such overhead tends to be negligible when processing the large quantities of data typically encountered in data compression applications.

Thus, one skilled in the art can see that by keeping the appropriate counters, the direct bit encode method of the present invention is effective for reducing an input string by one bit regardless of the bit pattern of the input string. Although a certain amount of "loss" is necessary in keeping and maintaining various counters and registers, for files which are sufficiently large, this overhead is insignificant compared to the savings obtained by the direct bit encode method.

The patent contains even more evasive arguments:

Page 22, line 31:

It is commonly stated that perfectly entropic data streams cannot be compressed. This misbelief is in part based on the sobering fact that for a large set of entropic data, calculating the number of possible bit pattern combinations is unfathomable. For example, if 100 ones and 100 zeros are randomly distributed in a block 200 bits long, there are 200C100 = 9.055 10^58 combinations possible. The numbers are clearly unmanageable and hence the inception that perfectly entropic data streams cannot be compressed. The key to the present compression method under discussion is that it makes no attempt to deal with such large amounts of data and simply operates on smaller portions.

6. A method of compressing a stream of binary data, comprising the steps of: parsing n-bits from said stream of binary data;

determining the value of said parsed n-bits;

based on the results of step B, coding said values of said n-bits in at least one of a first, second, and third target string, wherein coding said value includes generating a plurality of code strings and correlating said value with one of said code strings and dividing said correlated code string variable length codes and dividing at least some of said into at least first and second segments, and assigning at least one of said correlated code string segments to at least one of said first, second, and third target strings, wherein at least one of said plurality of codes is not greater than n-1 bits long. 7. The method of compressing a stream of binary data of claim 6, wherein n=2.

It took three years to the patent office to ascertain the validity of such a patent. A person with basic knowledge in mathematics and data compression can find the flaws immediately upon first reading.

Check here for an analysis of patent 5,488,364 on the same subject.

Jean-loup Gailly

Back to the gzip home page

back to Jean-loup's page