There are two common mitigation schemes for this attack. The first is to send a unique CSRF every time a page is loaded. By removing the consistent element from the page the threat of attack is removed. This approach requires the server to keep state of valid CSRFs and whether they have been used, additionally it can only be used to protect page tokens and not user-readable data. Another approach is to XOR all secrets in a response with a per-request random number and then transmit the number with the response. Once received a piece of JavaScript can then be used to recover the original secret by XORing the data again. Alternatively, the server can be modified to expect the XOR variant and the random number rather than the original secret. This approach allows for all secrets to be protected, however it requires client side post-processing. Additionally, both approaches require extensive, per page, modification which make mitigation incredibly cumbersome in practice. At present the only way to fully mitigate such an attack is to disable compression entirely on vulnerable websites, an impractical solution for most websites and content delivery networks.

Our Solution

We decided to use selective compression, compressing only non-secret parts of a page, in order to stop the extraction of secret information from a page. We found that in most cases a secret within a webpage can be described in terms of a classical regular expression. These descriptions allow us to identify secrets online as a response is streamed. Once the secrets are identified they can be flagged so that a modified compression library can ensure that they are not added to the dictionary. The primary advantage of this approach is that protection can be offered transparently by the web-server and the application does not need to be modified as long as a regular expression can be used to clearly express which portions of a response are secret. In addition, we do not need to maintain state for each user or require client-side JavaScript to appropriately render the page.

The proof-of-concept is implemented as a plugin for NGINX and requires a small patch to the gzip module. The plugin uses sregex to identify secrets within a page. The modified gzip functions as normal, however when a secret is processed compression is disabled. This ensures secrets do not get added to the compression dictionary, removing any on response size.

The regular expression matching engine we use in this proof-of-concept is not guaranteed to run in constant time. As such, matching a string against some regular expressions could introduce a timing based side-channel attack. This issue is compounded by the complexity of modern regular expressions as matching time can often be non-intuitive. Whilst in many cases the risk such an attack would pose is minimal, a limited matcher with constant runtime and restrictions on unbounded loops should be developed if our mitigation is adopted.

The Challenge Site

We have set up the challenge website compression.website with protection, and a clone of the site compression.website/unsafe without it. The page is a simple form with a per-client CSRF designed to emulate common CSRF protection. Using the example attack presented with the library we have shown that we are able to extract the CSRF from the size of request responses in the unprotected variant but we have not been able to extract it on the protected site. We welcome attempts to extract the CSRF without access to the unencrypted response.