One of the most troublesome aspects of copyright law as applied to technology is how the latter makes it possible - and even encourages - doing things that expose the intellectual incoherence of the former; copyright is merely an ad hoc set of rules and custom evolved for bygone economic conditions to accomplish certain socially-desirable ends (and can be criticized or abolished for its failures). If we cannot get the correct ontology of copyright, then the discussion is foredoomed . Many people suffer from the delusion that it is something more than that, that copyright is somehow objective, or even some sort of actual moral human right (consider the French “droit d’auteur”, one of the “moral rights”) with the same properties as other rights such as being perpetual . This is quite wrong. Information has a history, but it carries with it no intrinsic copyright.

This has been articulated in some ways serious and humorous, but we can approach it in an interesting way from the direction of theory.

One of the more elegant ideas in computer science is the proof that lossless compression does not compress all files. That is, while a algorithm like ZIP will compress a great many files - perhaps to tiny fractions of the original file size - it will necessarily fail to compress many other files, and indeed for every file it shrinks, it will expand some other file. The general principle here is TANSTAAFL:

“There ain’t no such thing as a free lunch.”

There is no free lunch in compression. The normal proof of this invokes the Pigeonhole Principle; the proof goes that each file must map onto a single unique shorter file, and that shorter file must uniquely map back to the longer file (if the shorter did not, you would have devised a singularly useless compression algorithm - one that did not admit of decompression).

But the problem is, a long string simply has more ‘room’ (possibilities) than a shorter string. Consider a simple case: we have a number between 0 and 1000, and we wish to compress it. Our compressed output is between 0 and 10 - shorter, yes? But suppose we compress 1000 into 10. Which numbers do 900-999 get compressed to? Do they all go to 9? But then given a 9, we have absolutely no idea what it is supposed to expand into. Perhaps 999 goes to 9, 998 to 8, 9997 to 7 and so on - but just a few numbers later we run out of single-digit numbers, and we face the problem again.

Fundamentally, you cannot pack 10kg of stuff into a 5kg bag. TANSTAAFL.

You keep using that word… The foregoing may seem to have proven lossless compression impossible, but we know that we do it routinely; so how does that work? Well, we have proven that losslessly mapping a set of long strings onto a set of shorter substrings is impossible. The answer is to relax the shorter requirement: we can have our algorithm ‘compress’ a string into a longer one. Now the Pigeonhole Principle works for us - there is plenty of space in the longer strings for all our to-be-compressed strings. And as it happens, one can devise ‘pathological’ input to some compression algorithms in which a short input decompresses into a much larger output - there are such files available which empirically demonstrate the possibility. What makes lossless compression any more than a mathematical curiosity is that we can choose which sets of strings will wind up usefully smaller, and what sets will be relegated to the outer darkness of obesity.

Compress Them All—The User Will Know His Own TANSTAAFL is king, but the universe will often accept payment in trash. We humans do not actually want to compress all possible strings but only ones we actually make. This is analogous to static typing in programming languages; type checking may result in rejecting many correct programs, but we do not really care as those are not programs we actually want to run. Or, high level programming languages insulate us from the machine and make it impossible to do various tricks one could do if one were programming in assembler; but most of us do not actually want to do those tricks, and are happy to sell that ability for conveniences like portability. Or we happily barter away manual memory management (with all the power indued) to gain convenience and correctness. This is a powerful concept which is applicable to many tradeoffs in computer science and engineering, but we can view the matter in a different way - one which casts doubt on simplistic views of knowledge and creativity such as we see in copyright law. For starters, we can look at the space-time tradeoff: a fast algorithm simply treats the input data (eg. WAV) as essentially the output, while a smaller input with basic redundancy eliminated will take additional processing and require more time to run. A common phenomenon, with some extreme examples , but we can look at a more meaningful tradeoff. Cast your mind back to a lossless algorithm. It is somehow choosing to compress ‘interesting’ strings and letting uninteresting strings blow up. How is it doing so? It can’t be doing it at random, and doing easy things like cutting down on repetition certainly won’t let you write a FLAC algorithm that can cut 30 megabyte WAV files down to 5 megabytes.