Constant-Time Encoding: Boring Cryptography, RFC 4648, and You

Earlier this year, we set out to implement an implementation of all the RFC 4648 character encoding functions (and their respective decoding functions) that was fully constant-time. Fortunately, a lot of the groundwork was already laid by Steve "Sc00bz" Thomas (of the DecryptoCat fame).

Our implementation is available on Github under paragonie/constant_time_encoding under the same license as Steve's code (MIT). Feel free to use it to enhance the security of your PHP projects, especially if you need Base32 encoding (which PHP doesn't provide but, for example, Google Authenticator requires).

It's also available via Composer:

composer require paragonie/constant_time_encoding

This post discusses a situation where encoding (which, despite widespread confusion, is a distinct concept from encryption) intersects with designing safe cryptography systems.

What is RFC 4648 Encoding?

Most PHP programmers are more familiar with these functions:

bin2hex() and hex2bin()

and base64_encode() and base64_decode()

They are but a subset of all the character encoding schemes defined in RFC 4648. The total list includes:

Base16 (hexadecimal)

Base32

Base32 with an extended Hex alphabet

Base64

URL-safe Base64

While other numerical bases are common (BitCoin uses Base-58 with a checksum), the cool thing about the character encoding schemes defined in RFC 4648 is that they're all powers of 2.

$2^{4} = 16$

$2^{5} = 32$

$2^{6} = 64$

The RFC 4648 encoding schemes are advantageous when working with compressed or raw binary data and converting it into an ASCII-only form for transport. Among other things, they are useful for storing encryption keys in a JSON configuration file.

What Does Constant-Time Mean?

When a function is constant-time, it means that the time it takes to perform a calculation is not dependent on the contents of its inputs; only their size.

For example, this will return FALSE as soon as $password contains a character other than a :

var_dump($password === str_repeat("a", 1024));

Conversely, this will always take the same time (assuming $password is 1024 characters long):

var_dump(hash_equals($password, str_repeat("a", 1024)));

This might not seem like a big deal, but if you're comparing a Message Authentication Code for an encrypted message, if it doesn't always take the same amount of time to compare the MAC you calculated for the message with the MAC that was sent, an attacker can slowly deduce a valid MAC for a forged message. This is called a timing attack.

String comparisons (which outside of cryptography are considered a benign operation) aren't the only thing that can leak timing information. Micro-architecture side channels, such as cache-timing attacks, are far more pernicious. Even software implementations of AES are vulnerable to cache-timing attacks (PDF). Cryptographers have been able to deduce the AES key based on cache-timing information in 65 milliseconds.

Why the World Needs Constant-Time RFC 4648 Encoding

If you work with cryptography, you probably generate and store secret information by encoding it. If you're using the standard character encoding functions that ship with most programming languages, you might be opening the door to cache-timing attacks.

This is still an open research question, but we do know that, if used on cryptographic secrets, most programming languages are using table look-ups indexed by secret data, which is an easy way to introduce cache-timing vulnerabilities into a cryptosystem.

Rather than wait for a practical exploit be developed, the solution is available today (for PHP, anyway). If you're writing software in another language, libsodium ships with hexadecimal encoding/decoding functions that are cache-timing-safe.

How We Designed our Constant-time Encoding Library

We started with a simple fork of Steve Thomas's ConstTimeEncoding to modernize the code (PSR-4, integration with Composer, etc.). We ended up going a lot further:

Instead of ord() and chr() , which amplify cache-timing leaks due to PHP's optimizations, we instead opted to use pack() and unpack()

and , which amplify cache-timing leaks due to PHP's optimizations, we instead opted to use and To better handle function overloading, we wrote a Binary class that reliably delivers the expected results of strlen() and substr() when working with raw binary data.

class that reliably delivers the expected results of and when working with raw binary data. We implemented Hex , Base32 , Base32Hex , and Base64UrlSafe for complete RFC 4648 coverage.

, , , and for complete RFC 4648 coverage. We built a unit test suite (via PHPUnit)

How to Use the Library

The code deltas between using our library and PHP's built-in functions was kept reasonably small.

Old:

<?php var_dump(strtr(base64_encode(random_bytes(32)), '+/', '-_'));

New:

<?php use \ParagonIE\ConstantTime\Base64UrlSafe;



var_dump(Base64UrlSafe::encode(random_bytes(32)));

For Hex and Base32 encoding, there is a separate encodeUpper() method that returns capital letters instead of lowercase.

What Projects Use this Library?

On Boring Cryptography

"Boring cryptography" refers to cryptography designs and implementations that are obviously secure. This means having at least $2^{128}$ bits of security (Ed25519) instead of 1024-bit RSA (which is estimated to be approximately $2^{80}$). Boring cryptography means being obviously constant-time. When cryptography is boring, there's far less room for implementers to make cataclysmic mistakes (such as repeating an ECDSA nonce).

Cryptographers are working hard to bring boring cryptography to the masses. Paragon Initiative Enterprises is similarly working hard to bring boring levels of security to PHP. That is why we're building Airship: The PHP community deserves a CMS/blogging platform that is obviously secure, written from an understanding of how PHP applications are attacked in the real world.

Remember, "Attacks only get better; they never get worse."