CipherSweet: Searchable Encryption Doesn't Have to be Bitter

Back in 2017, we outlined the fundamentals of searchable encryption with PHP and SQL. Shortly after, we implemented this design in a library we call CipherSweet.

Our initial design constraints were as follows:

Only use the cryptography tools that are already widely available to developers. Only use encryption modes that are secure against chosen-ciphertext attacks. Treat usability as a security property. Remain as loosely schema-agnostic as possible, so that it's possible to use our design in NoSQL contexts or wildly different SQL database layouts. Be extensible, so that it may be integrated with many other products and services.

Today, we'd like to talk about some of the challenges we've encountered, as well as some of the features that have landed in CipherSweet since its inception, and how we believe they are beneficial for the adoption of usable cryptography at scale.

If you're not familiar with cryptography terms, you may find this page useful.

Challenges in Searchable Encryption

As of the time of this writing, it's difficult to declare a "state of the art" design for searchable encryption, for two reasons:

Different threat models and operational requirements. Ongoing academic research into different designs and attacks.

Cryptographers interested in encrypted search engines are likely invested in the ongoing research into fully homomorphic encryption (FHE), which allows the database server to perform calculations on the ciphertext and return an encrypted result to the application to decrypt.

Some projects (e.g. the encrypted camera app Pixek and much of the other work of Seny Kamara, et al.) uses a technique called structured encryption to accomplish encrypted search with a different threat model and set of operational requirements. Namely, the queries and tags are encrypted client-side and the server just acts as a data mule with no additional power to perform computations.

In either case, there are a few challenges that any proposed design must help its users overcome if they are to be used in the real world.

Active Cryptanalytic Attacks

The most significant real-world deterrents from adopting fully homomorphic encryption today are:

Performance. Cryptography implementation availability.

However, savvy companies will also list a third deterrent: adaptive chosen-ciphertext attacks.

This can be a controversial point to raise, because its significance depends on your application's threat model. Some application developers really trust their database server to not lie to the application.

More generally, all forms of active attacks from a privileged but not omnipotent user (e.g. root access to the database server, but not root access on the client application software) should be considered when design any kind of encrypted search feature.

Small Input Domains

Let's say you're designing software for a hospital computer network and need to store protected health information with very few possible inputs (e.g. HIV status).

Even if you can encrypt this data securely (i.e. using AEAD and without message length oracles), any system that allows you to quickly search the database for a specific value (e.g. HIV Positive) introduces the risk of leaking information through side-channels.

Information Leakage

Search operations are ripe for oracles.

In particular: Order-revealing encryption techniques leak your plaintext, similar to block ciphers in ECB mode.

Any proposal for searchable encryption must be able to account for its information leakage and provide users a simple way of understanding and managing that risk.

CipherSweet: A High-Level Overview

This is a brief introduction to CipherSweet and a high-level overview. For more depth, please refer to the official documentation on Github.

Where to Get CipherSweet

CipherSweet is available on Github, and can be installed via Composer with the following command:

composer require paragonie/ciphersweet

Using CipherSweet

First, you need a backend, which handles all of the cryptographic heavy lifting.

We give you two to choose from, but there's also a BackendInterface if anyone ever needs to define their own:

FIPSCrypto only uses the algorithms approved for use by FIPS 140-2. Note that using this backend doesn't automatically make your application FIPS 140-2 certified.

only uses the algorithms approved for use by FIPS 140-2. Note that using this backend doesn't automatically make your application FIPS 140-2 certified. ModernCrypto uses libsodium, and is generally recommended in most situations.

Once you've chosen a backend, you're done thinking about cryptography algorithms. You don't need to specify a cipher mode, or a hash function, or anything else. Instead, the next step is to decide how you want to manage your keys.

In addition to a few generic options, CipherSweet provides a KeyProviderInterface to allow developers to integrate with their own custom key management solutions.

Finally, you just need to pass the backend and key provider to the engine. From this point on, the engine is the only object you need to work with directly.

All together, it looks like this:

<?php use ParagonIE\CipherSweet\Backend\ModernCrypto; use ParagonIE\CipherSweet\KeyProvider\StringProvider; use ParagonIE\CipherSweet\CipherSweet; // First, choose your backend: $backend = new ModernCrypto(); // Next, your key provider: $provider = new StringProvider( // The key provider stores the BackendInterface for internal use: $backend, // Example key, chosen randomly, hex-encoded: '4e1c44f87b4cdf21808762970b356891db180a9dd9850e7baf2a79ff3ab8a2fc' ); // From this point forward, you only need your Engine: $engine = new CipherSweet($provider);

Once you have an working CipherSweet engine, you have a lot of flexibility in how you use it. In each of the following classes, you'll mostly use the following methods:

prepareForStorage() on INSERT and UPDATE queries.

on INSERT and UPDATE queries. getAllBlindIndexes() / getBlindIndex() for SELECT queries.

/ for SELECT queries. decrypt() / decryptRow() / decryptManyRows() for decrypting after the SELECT query.

The encrypt/decrypt APIs were named more verbosely than simply encrypt() / decrypt() to ensure that the intent is communicated whenever a developer works with it.

EncryptedField: Searchable Encryption for a Single Column

EncryptedField is a minimalistic interface for encrypting a single column of a database table.

EncryptedField is designed for projects that only ever need to encrypt a single field, but still want to be able to search on the values of this field.

<?php use ParagonIE\CipherSweet\BlindIndex; use ParagonIE\CipherSweet\CipherSweet; use ParagonIE\CipherSweet\EncryptedField; use ParagonIE\CipherSweet\Transformation\LastFourDigits; /** @var CipherSweet $engine */ $ssn = (new EncryptedField($engine, 'contacts', 'ssn')) ->addBlindIndex( new BlindIndex('contact_ssn_full', [], 8) ) ->addBlindIndex( new BlindIndex('contact_ssn_last_four', [new LastFourDigits], 4) );

EncryptedRow: Searchable Encryption for Many Columns in One Table

EncryptedRow is a more powerful API that operates on rows of data at a time.

EncryptedRow is designed for projects that encrypt multiple fields and/or wish to create compound blind indexes.

It also has built-in handling for integers, floating point numbers, and (nullable) boolean values, (which furthermore doesn't leak the size of the stored values in the ciphertext length):

<?php use ParagonIE\CipherSweet\CipherSweet; use ParagonIE\CipherSweet\EncryptedRow; /** @var CipherSweet $engine */ $row = (new EncryptedRow($engine, 'contacts')) ->addTextField('first_name') ->addTextField('last_name') ->addTextField('ssn') ->addBooleanField('hivstatus') ->addFloatField('latitude') ->addFloatField('longitude') ->addIntegerField('birth_year');

EncryptedRow expects an array that maps column names to values, like so:

<?php $input = [ 'contactid' => 12345, 'first_name' => 'Jane', 'last_name' => 'Doe', 'ssn' => '123-45-6789', 'hivstatus' => false, 'latitude' => 52.52, 'longitude' => -33.106, 'birth_year' => 1988, 'extraneous' => true ];

EncryptedMultiRows: Searchable Encryption for Many Tables

EncryptedMultiRows is a multi-row abstraction designed to make it easier to work on heavily-normalized databases and integrate CipherSweet with ORMs (e.g. Eloquent).

Under the hood, it maintains an internal array of EncryptedRow objects (one for each table), so the features that EncryptedRow provides are also usable from EncryptedMultiRows .

Anyone familiar with EncryptedRow should find the API for EncryptedMultiRows to be familiar.

<?php use ParagonIE\CipherSweet\CipherSweet; use ParagonIE\CipherSweet\EncryptedMultiRows; /** @var CipherSweet $engine */ $rowSet = (new EncryptedMultiRows($engine)) ->addTextField('contacts', 'first_name') ->addTextField('contacts', 'last_name') ->addTextField('contacts', 'ssn') ->addBooleanField('contacts', 'hivstatus') ->addFloatField('contacts', 'latitude') ->addFloatField('contacts', 'longitude') ->addIntegerField('contacts', 'birth_year') ->addTextField('foobar', 'test');

EncryptedRows expects an array of table names mapped to an array that in turn maps columns to values, like so:

<?php $input = [ 'contacts' => [ 'contactid' => 12345, 'first_name' => 'Jane', 'last_name' => 'Doe', 'ssn' => '123-45-6789', 'hivstatus' => null, // unknown 'latitude' => 52.52, 'longitude' => -33.106, 'birth_year' => 1988, 'extraneous' => true ], 'foobar' => [ 'foobarid' => 23, 'contactid' => 12345, 'test' => 'paragonie' ] ];

CipherSweet's Usable Cryptography Wins

In addition to being designed in accordance to cryptographically secure PHP best practices, CipherSweet was also carefully constructed to be a user-friendly cryptographic API.

Here are some of the design decisions and features that lend towards hitting its usable security goals.

Blind Index Planning

If you're not familiar with blind indexes, please read the blog post detailing the fundamentals of our design.

Our blind indexing technique has a relatively straightforward information leakage profile, since the building block we use is a keyed hash function (e.g. HMAC-SHA384 or BLAKE2b) or key derivation function (e.g. PBKDF2-SHA384 or Argon2id), which is then truncated and used as a Bloom filter.

If you make your index outputs too small, you'll incur a performance penalty from false positives that makes the blind index almost pointless.

If you make your index outputs too large, you introduce the risk of creating unique fingerprints of the plaintext. The existence of reliable fingerprints introduce the risk of known- and chosen-plaintext attacks.

However, calculating a safe output size for each blind index involves a bit of math:

Generally, for a given population P, you want there to be between 2 and sqrt(P) hash prefix collisions (which we call "coincidences") in the blind index output.

To save developers time doing pencil and paper math, we created Planner classes, which let you figure out how many bits you can safely make your blind index outputs. No pencil and paper needed.

Compound Blind Indexes

A compound blind index is simply a blind index that was created from multiple fields at once. This is extremely useful if you want to filter your encrypted search results based on a boolean field without leaking the boolean value directly in the index value.

More broadly, compound blind indexes give you a flexible way to index common search criteria to make lookups fast.

For example, using EncryptedRow :

<?php use ParagonIE\CipherSweet\CipherSweet; use ParagonIE\CipherSweet\Transformation\AlphaCharactersOnly; use ParagonIE\CipherSweet\Transformation\FirstCharacter; use ParagonIE\CipherSweet\Transformation\Lowercase; use ParagonIE\CipherSweet\Transformation\LastFourDigits; use ParagonIE\CipherSweet\EncryptedRow; /** @var EncryptedRow $row */ $row->addCompoundIndex( $row->createCompoundIndex( 'contact_first_init_last_name', ['first_name', 'last_name'], 64, // 64 bits = 8 bytes true ) ->addTransform('first_name', new AlphaCharactersOnly()) ->addTransform('first_name', new Lowercase()) ->addTransform('first_name', new FirstCharacter()) ->addTransform('last_name', new AlphaCharactersOnly()) ->addTransform('last_name', new Lowercase()) );

This gives you a case-insensitive index of first initial + last name.

Built-In Key Separation

Information leakage is especially harmful if you're using the same key everywhere.

To mitigate this, CipherSweet automatically derives distinct subkeys for each table and column, and then for each blind index, using a process called the key hierarchy.

The short of it is: Your KeyProvider defines a master key, from which the actual key used for encrypting each field is derived. We use HKDF and carefully-chosen domain separation constants to ensure cross-protocol attacks are not possible.

Key Rotation

If you need ever to switch CipherSweet backends or rotate your keys, we created a special-purpose suite of PHP classes to facilitate less-painful data migrations and reduce the amount of boilerplate code needed.

<?php use ParagonIE\CipherSweet\CipherSweet; use ParagonIE\CipherSweet\KeyRotation\FieldRotator; use ParagonIE\CipherSweet\EncryptedField; // 1. Set up /** * @var string $ciphertext * @var CipherSweet $old * @var CipherSweet $new */ $oldField = new EncryptedField($old, 'contacts', 'ssn'); $newField = new EncryptedField($new, 'contacts', 'ssn'); $rotator = new FieldRotator($oldField, $newField); // 2. Using the if ($rotator->needsReEncrypt($ciphertext)) { list($ciphertext, $indices) = $rotator->prepareForUpdate($ciphertext); // Then update this row in the database. }

You can learn more about the various various migration features here.

Upcoming Developments in CipherSweet

One of the items on our roadmap for PHP security in 2019 is to bring CipherSweet to your favorite framework, with as little friction as possible. To this end, we will be releasing ORM integrations throughout Q1 2019, starting with Eloquent and Doctrine.

Additionally, we plan on shipping KeyProvider implementations to integrate with cloud KMS solutions and common HSM solutions (e.g. YubiHSM). These will be standalone packages that extend the core functionality of CipherSweet to allow businesses and government offices to meet their stringent security compliance requirements without polluting the main library with code to tolerate oddly-specific requirements.

When both of these developments have been completed, adopting searchable encryption in your PHP software should be as painless as possible.

Finally, we want to develop CipherSweet beyond the PHP language. We want to provide compatible implementations for Java, C#, and Node.js developers in our initial run, although we're happy to assist the open source community in developing and auditing compatible libraries in other languages.

Honorable mention: Ryan Littlefield has already started on an early Python implementation of CipherSweet.

Support the Development of CipherSweet

If you'd like to support our development efforts, please consider purchasing an enterprise support contract from our company.