Securely Implementing (De)Serialization in PHP

A frequent problem that developers encounter when building web applications in PHP is, "How should I represent this data structure as a string?" Two common examples include:

Caching a complex data structure (to reduce database load)

Communicating API requests and responses between HTTP-aware applications

This seems like the sort of problem that you could expect would have pre-existing, straightforward solutions built into every major programming language that aren't accompanied by significant security risk. Sadly, this isn't the case.

Popular Serialization Strategies (and their Respective Vulnerabilities)

Let's look at the most common use cases, and the danger involved with each:

serialize() and unserialize()

json_encode() and json_decode()

SimpleXML (or any other XML parser)

PHP Object Injection with unserialize()

Before PHP 7, if you ever passed user input to unserialize() , your application was at a high risk for PHP Object Injection to remote code execution exploits. This made a lot of developers (rightly) gun-shy about using this function for any reason in a PHP 5 project.

(Although we're focusing on PHP, Java is vulnerable too.)

In PHP 7, they added a second optional parameter to unserialize() that allows you to specify a whitelist of allowed classes (where "none" is an acceptable whitelist) if you're only serializing scalar types.

$data = serialize($foo); // PROBABLY SAFE, restrictive: $object = unserialize($data, ['allowed_classes' => false]); // PROBABLY SAFE, unless an attacker can control the whitelist: $whitelist = ['MyProject\\OtherNamespace\\ObjectAllowed']; $object = unserialize($data, ['allowed_classes' => $whitelist]); // DEFINITELY UNSAFE: $hackMe = unserialize($data, ['allowed_classes' => true]); $hackMe = unserialize($data);

You might think, with PHP 7, that using one of the PROBABLY SAFE configurations is good enough, but beware: Many exploits affecting PHP in the past few years were the result of unserialize() bugs.

Recommendations:

Avoid ever passing user data to unserialize()

If you must unserialize user data in a PHP 7 project, make sure you don't allow arbitrary classes

See below.

The standard recommendation made by experienced PHP developers (which is also present in the PHP manual entry for unserialize() ) is to instead use JSON encoding when handling user input.

Hash-Table Collision Denial of Service with json_decode()

While JSON decoding doesn't carry the risk of remote code execution, it does make you vulnerable to hash-table collision denial of service attacks.

This is the same vulnerability as CVE-2011-4885 and there's currently no real fix for it (although the PHP internals team has been trying very hard to fix it without creating backwards compatibility issues or degrading performance).

The blog post that introduced the JSON hash-DoS alluded to the fact that XML parsers don't suffer from this exact vulnerability. However, XML has other problems.

Recommendations:

See below.

XML Parsing

XML parsers in general are a juicy target for attackers, because they enable the obvious step between "successful phishing email with attachment" and "compromised system". (Word processors and other office software with XML files or zipped archives of XML files.)

The most obvious threat you have to be prepared for when processing XML is an XML External Entity (XXE) attack. These can be mitigated by making sure this command run before any XML is processed:

libxml_disable_entity_loader(true);

Even with this protection in place, libxml has an established history of security vulnerabilities. You can't really get away from libxml2 in PHP, but even if you could, most XML parsers mishandled CDATA sections in often exploitable ways.

Recommendations:

See below.

Everything is Broken; How Do We Protect Ourselves?

So far, we've only looked at known vulnerabilities with each of these three deserialization strategies built into the PHP language, and the situation looks terrible. The prospect of as-of-yet undiscovered vulnerabilities only makes it much more grim. However, these vulnerabilities aren't unavoidable.

Recommendation: Only Accept GET and POST Fields

When designing your APIs, instead of accepting a JSON/XML blob (or serialize() output) in the request body (like many APIs expect), use the tools provided by the HTTP standard: GET and POST fields. PHP already mitigates against hash-table collision denial of service attacks in these (since 2011).

// Sending: // Don't do this: curl_setopt($ch, CURLOPT_POSTFIELDS, '{"data":{"does_php_do_the_right_thing_here?":"no"}}'); // Do this instead: curl_setopt($ch, CURLOPT_POSTFIELDS, [ 'data' => [ 'does_php_do_the_right_thing_here?' => 'yes, yes it does' ] ]); // Receiving: $data = $_POST['data'];

If you're building API requests by hand, http_build_query() will come in handy.

Recommendation: Authenticate the Messages you Send to Yourself

While the previous recommendation dealt with receiving arbitrary data from end users, this is more suitable for circumstances where you have two servers talking to each other (e.g. an internal API for communicating with a microservice) or you're somehow storing data on the client and don't want it to be tampered with (e.g. encrypted cookie).

This is also a good idea in situations where you're, for example, storing data in a memcached cluster and want to reduce the lateral attack surface if one of the other servers gets compromised.

This recommendation takes a page from the JSON Web Tokens approach:

Serialize then authenticate.

Verify then deserialize.

In most cases a simple HMAC (with constant-time validation) will suffice; in others, you'll need to use digital signatures. Either way, ask a security expert to review your decision and implementation. (If you don't know any: We consult.)

For example, using Halite:

<?php use \ParagonIE\Halite\Symmetric\Crypto as Symmetric; use \ParagonIE\Halite\KeyFactory; use \ParagonIE\Halite\Util; $authKey = KeyFactory::loadAuthenticationKey('/outside/project/path/auth.key'); // Serialization: $serialized = json_encode($yourData); $storeMe = Symmetric::authenticate($serialized, $authKey) . $serialized; // Deserialization: $mac = Util::safeSubstr($storeMe, 0, 2 * \Sodium\CRYPTO_AUTH_BYTES); $message = Util::safeSubstr($storeMe, 2 * \Sodium\CRYPTO_AUTH_BYTES); if (Symmetric::verify($message, $authKey, $mac)) { $object = json_decode($message); }

Another example, using public key cryptography (digital signatures):

<?php use \ParagonIE\Halite\Asymmetric\Crypto as Asymmetric; use \ParagonIE\Halite\KeyFactory; use \ParagonIE\Halite\Util; $keyPair = KeyFactory::loadSignatureKeyPair('/outside/project/path/signing.secretkey'); $secretKey = $keyPair->getSecretKey(); // Serialization: $serialized = serialize($yourData); $storeMe = Asymmetric::sign($serialized, $secretKey) . $serialized; // Deserialization: $publicKey = KeyFactory::loadSignaturePublicKey('/outside/project/path/signing.publickey'); // Or $publicKey = $keyPair->getPublicKey(); $signature = Util::safeSubstr($storeMe, 0, 2 * \Sodium\CRYPTO_SIGN_BYTES); $message = Util::safeSubstr($storeMe, 2 * \Sodium\CRYPTO_SIGN_BYTES); if (Asymmetric::verify($message, $publicKey, $signature)) { $object = unserialize($message, ['allowed_classes' => false]); }

In the future, we hope to see PHP internally adopt a non-deterministic hash table (i.e. something similar to SipHash with a randomly generated key) to make data serialization safer. Until then, your best bet is to either avoid these features in the first place or use strong cryptography as a mitigation.