The BREACH attack on SSL was recently disclosed at Black Hat. It depends on an attacker being able to notice a small difference in response size between two very similar requests.

Therefore, various mitigation methods which have been proposed involve attempting to disguise the length of a response. Some have said that this is mathematically infeasible, in that there is no way of disguising it enough without destroying the compression performance to such a degree that you might as well just switch compression off. I am not qualified to assess that claim, but I did have an idea as to how to add arbitrary length to the response in a transparent way.

My suggestion to vary the length is to add some non-determinism to gzip on the server side. This has the significant advantage of being transparent both to the application being served and to the client – it requires no changes to either. This makes it much easier to deploy than mechanisms which pad the content itself, or add data which needs to be explicitly ignored on the client. A quick review of the algorithm suggests that there are several places where you could introduce non-determinism.

The gzip algorithm either reproduces a string of bytes literally or, if that string has occurred before, replaces it with a pointer-length pair to a previously-occurring string of bytes. I suggest that the way that adds most variability to the output length would be to have gzip either not shorten a string, or shorten it to less than the maximum possible shortening, on some small percentage of occasions when it would otherwise do a full compression. The question is whether one can add enough length non-determinism to make the leaky length difference hard to detect without many repetitions and statistical analysis (which would drastically slow down the attack), without destroying the compression performance.

This scheme can be used to add arbitrary amounts of additional length (up to the original size of the file) in arbitrary places and patterns, dependent on static or variable factors.

As determinism may be an expected feature of gzip in some contexts, it would probably need to be made an option (which mod_gzip and similar libraries would invoke).

As an opening (and probably terrible) proposal for a static random length increase, I suggest that for each replaced sequence in the file, gzip decides either to replace it fully with a pointer+length, or to do so for all the chars bar the last one (emitting that char as a literal). It should do this with a probability of 1/N+1 for the Nth replacement in the file. This front-loads the non-determinism, producing an exponential decay in the likelihood of the sub-optimal choice being chosen, and meaning that the deleterious effect on large files is not overly large but small files still get a significant length variation.

This method might also work as a mitigation for CRIME.