It is not possible to losslessly compress all files of size $n$ using a single algorithm, as there are more files of size $n (2^n)$ than of size $p, p: p < n ( 2^n-1)$. Via the pigeon hole principle, if we only tried to compress files of size $n$ with a single algorithm, there would be at least one file it was impossible to compress.

If we wanted to be able to compress files with differing lengths $nk$, the number of files of length $nk$ we can compress for each $n_k$ becomes even smaller.

Today when reading a story about how a file that was several gigabytes when compressed uncompressed to one gigabyte, I had an idea for a universal compression algorithm.

Let $ai$ be a compression algorithm.

Let $gj$ be a file.

$|gj|$ denotes the length of $gj$.

Let $f(ai, gj)$ be a function that returns $(|gj| - |ai(g_j)|)$.

Let $SN = gj | |g_j| \le N$.

Let $A = ai | \, \forall \, gj \in SN \, \exists \, ai \in A | f(ai, gj) \gt 0$.

Let $m$ be the length of the label of the compression algorithm chosen. The first $m$ bytes of every compressed file denote the compression algorithm chosen.

$m = \lceil(\log_2{#S})\rceil$.

Then you can compress all $gj \in SN$, by iterating through A until you find $ai | f(ai,g_j) - m \lt 0$.

Even better.

For each $gj$, let $aj$ be the corresponding compression algorithm.

Let $h(ai, gj) = f(ai,gj) - m$.

$${ \, \forall \, ai \in A, gj \in SN, aj = \displaystyle{ \underset{ai \in A, gj \in S} { \operatorname{argmax} } } \, (h(ai, gj))}$$

Is there a reason why the above is not done?

While the above is an algorithm, and one could argue that the pigeon hole principle thus applies, this does not imply what it may at first seem to imply. The above algorithm call it $a^v$ is a little different.

Let $ai: SN \to YN^i$ denote that algorithm $ai$ maps a family of files $(SN = {gj | |gj| \le N})$ is mapped to another family of files $YN = {yj | yj = ai(gj)}$.

$\forall ai \in A, ai: SN \to YN^i$.

However, $a^v | S{N+m} \to Y{N+m}^v$.

So $a^v$ compresses a different family of files from $a_i \ in A$.

The pigeon hole principle merely states that $a^v$ cannot compress all files of length $N+m$; this is irrelevant, since $a^v$ only intends to compress a small subset of files of length $N+m$ (those whose first $m$ bits are the labels of some $a_i \in A$.