A new exciting vulnerability (yes sorry, we easily get excited about these things 😜) has been released in Ruby. CVE-2018-8778 is a Buffer under-read that is triggered by String#unpack . Kudos to Eyal Itkin for discovering this vulnerability! In this article, we will do a deep dive into the vulnerability, show how to exploit it and how to mitigate it.

What’s a buffer under-read?

Nearly all meaningful computing we do is done on data structures (object) that are stored in memory.

Each object has a defined size and a layout of fields in memory. So someone looking at the memory can see our object flattened as binary data of a given length (zeros & ones). This is what we call a buffer (a zone of memory). As a developer, when operating on our object, we should work within a given buffer and shouldn’t read/write before/after our object.

Ruby allocates data on what is called the heap. That’s a memory space (another one is the stack). Nearly every Ruby object will land there.

So a buffer under-read is a vulnerability that allows attackers to access memory locations before the targeted buffer. This typically occurs when the pointer or its index is decremented to a position before the buffer.

It is a critical vulnerability, but the severity really depends on the data that your application handles. It may result in the exposure of sensitive information or possibly a crash. You could potentially leak data like tokens, database credentials, session cookies or even a transiting credit card number.

So what the heck is CVE-2018-8778?

Here is the announcement of the vulnerability. To better understand it let’s first dig into the Ruby source code of the fixing commit here (yeah OK it’s an SVN revision, Ruby history predates Git!).

The vulnerability is located inside the String#unpack method. This method decodes str according to the provided string format, returning an array of each value extracted (You can read more about it on RubyDoc). The format string consists of a sequence of single-character directives (numbers, “*”, “_” or “!”) and can specify the position of the data being parsed by the specifier @ . That’s where the issue lies.

From the non-regression test, we can see that one just need to use a specific format to trigger this. This format string actually is like a mini program. The string “@42C10” decodes as: skip 42 bytes then decode 10 8-bit integers.

The issue here is that the offset is poorly validated. If a significant number is passed with @ , the number is treated as a negative value, and unpack skips a negative amount of bytes. This is where the out-of-buffer read occurs. So an attacker could use this to read sensitive data on the heap.

The poorly validated offset is a classic mistake called integer overflow. When using signed integers, trying to decode what would be a huge unsigned integer value, the decoded value will be a negative number. Here it gives us a way to have negative offsets. On a related note, the first Ariane 5 crash was triggered by this… Source.

How does this integer overflow happen here?

String#unpack is actually defined in Ruby core source code in the C programming language. As we can see in the remediation commit the offset that is expressed as a string (a char * in C) has to be translated to an integer value. For this, Ruby uses a macro called STRTOUL which in turn calls ruby_strtoul (they are defined in ruby.h). As the name seems to prefigure this will output an unsigned long integer.

unsigned long ruby_strtoul(const char *str, char **endptr, int base);

Until here, no issue, the string “18446744073709551416” is correctly decoded to the long integer 18446744073709551416 . Yet this value is stored in len which is declared as a long , a signed number. Doing this casts the unsigned number to signed number and 18446744073709551416 becomes -200 .

Here are the offending pieces together:

View the code on Gist.

How do you exploit this vulnerability?

The first step in exploiting the vulnerability in a live application is having a small Proof of Concept (PoC). Let’s first try to read memory from the irb interactive shell. We will use the nice hexdump gem to display something easier to read for us humans. Without further ado here is a one-liner that will do this.

View the code on Gist.

So what are we doing here? We have a parameter for the size of the leak ( leak ), and we calculate the huge number that will then be decoded as a negative integer. We then create a small buffer and unpack it using a format string using these two values. We’re essentially saying: skip to this huge offset (which ends up a -leak bytes) and reads leak + 4 bytes.

This returns 204 bytes as requested.

By passing a large integer as offset, we have gone back 200 bytes before the start of BUFF in memory and then read 204 bytes (the generated format string used: @18446744073709551416C204 ).

As a sanity check, we can correctly see the content of the buffer we should be working in at the end of the ASCII dump part (BUFF). If Ruby was not vulnerable, it should never have jumped to read memory before the start of BUFF.

How do we go from the PoC to actual exploitation? We would first need to find an application running on a vulnerable Ruby (pre 2.5.1) application that we can attack. A Ruby on Rails application would be nice, as we could attack it remotely and often contains juicy secrets. This Rails application would need to have a String#unpack call were the format parameter is under attacker control. String#unpack calls are more common than you might think. They are often used to decode data coming from elsewhere (i.e. database drivers are often users of this). So to know if you are affected you might also want to take a look at all your dependencies source code…

If we had such an application, simply sending our generated malicious format string from above enables us to extract as much data as we want from the application. This allows us to read & possibly extract all the secrets that are stored in memory (database credentials, tokens) and maybe also data that would only be transiting through the application (customer credit card number or user session on a concurrent request).

Building a remediation

The easiest fix is, of course, to simply update Ruby on your machine. In the real world, this is unfortunately not always quickly doable. This reality pushed us to craft a solution that would protect all Sqreen users from CVE-2018-8778 even if they couldn’t update their Ruby version just yet.

Two primary requirements drive the development of new protections at Sqreen.

First, we can’t break our users’ application with false positives (i.e block legit requests).

Second, the impact on performance should be nearly invisible.

After looking at a few options, we decided that the best solution is to “simply” hook the String#unpack method and check that the argument containing @ doesn’t include a large offset in the format string. The key here is to make sure this format string is not coming from the current request parameters.

So the rule we implemented looks a bit like:

TWO_GIGABYTES= 2**31 return false unless format_string.include?('@') return false unless user_parameters.include?(format_string) offset = parse(format_string) return offset > TWO_GIGABYTES

Now let’s look at an example:

a format string `C10` stops processing at the first line ⇒ no attack detected

a format string `@10c12` that is not in user parameters stops at the second line ⇒ no attack detected If it comes from user parameters (the code would probably be vulnerable), we check the offset size 10 and stop processing ⇒ no attack detected

With `@18446744073709551416C204` as a format string, the offset `18446744073709551416` is bigger than 2**31 ⇒ an attack is detected and will be blocked!

And that’s it. After extensive tests, we deployed this rule to our users. They are now all protected against this buffer under-read vulnerability and can update their Ruby version when the time is right. All of this was achieved in less than 21 hours between the disclosure of the vulnerability and the full protection of our clients.