The problem(s)

Picture the following (nightmare) scenario: despite your best efforts to secure your server, somebody has broken into it, and stole a copy of your entire database, full of private, sensitive data about your users.

However, if you store the database fields encrypted (the sensitive data, at the very least), all the attackers will have is undecipherable byte strings - your users' privacy is safe.

But wait: In order for your server to be able to store and retrieve the data, it would need to know the encryption key! And you cannot store that key in the database itself - or anywhere in the server's hard drive, for that matter - or the attackers will also acquire the key when they break into the server.

And there is another problem: storing fields in an encrypted state would make it impossible to perform searches - we can't generate index a database to search for fields containing, say, the word 'mortgage', if the fields are encrypted.

And no, creating an index based on the original, unencrypted data won't work - if the attacker sees that a given, encrypted field, has indices pointing to it, that indicate that it contains the words 'mortgage', 'payment', 'January' and '2015', then the contents of the field would become rather obvious - even without decrypting it.

Fortunately, there are ways around these two problems.

Generating the key without storing it: key derivation functions.

The solution to the first problem, is to generate the key from a password (that is not stored anywhere).

To achieve this, we need a method to generate a cryptographic key that is completely deterministic - entering the same password a million times will output the same key a million times - and, simultaneously, completely unpredictable - an attacker should be unable to guess the key without knowing the password it was generated from.

Such a method is called a "key derivation function" - or KDF, for short.

Fortunately, such a function, called PBKDF2 is widely available - included in OpenSSL, and thus usable from platforms like Node.js, PHP, etc, out of the box.

PBKDF2 works by "stretching" strings - it takes the password (or any string) you give it, and performs thousands of complex transformations on it, until it produces a sequence of bytes, of any length you require, with no apparent relationship with the original string.

You would need to provide the password to your web server every time you start it (since it isn't stored anywhere); it will use the password to generate the encryption key, and use that key from then on, without storing it.

Indexing encrypted fields without revealing their contents: blind indexing.

The solution to the second problem, is to generate blind indices.

The idea is to compute a hash from the search terms, and then use the hash for indexing.

For example: let's suppose we want to find which of the (encrypted) fields contain the word 'mortgage'.

First, we compute a hash from the word: say, 'mortgage' -> 14231297424532579.

Then, we create an index that lists all the fields that contain the word whose hash is the number 14231297424532579 - without storing the actual word 'mortgage'. i.e. 'The word whose hash is 14231297424532579 is contained in the rows 34, 156, 1240, ...'

This is called a 'blind index' - since it does not actually store the word 'mortgage' - an attacker that acquired the database will be unable to figure out, from that index, which words are contained on which fields.

Finally, whenever we need to search for rows containing the word 'mortgage', we compute its hash again (that is, the number 14231297424532579), then look it up in the blind index.

In order for this to work, we need a hashing function - for this, we can use the same string stretching function (PBKDF2) we used to generate the key from the password - we simply stretch the word into a 8-byte string, then interpret said string as a 64-bit integer, which we can use as the hash.

We can add an extra layer of security by using a secret salt for the hash generation - and generate that hash from the same password we used to generate the key. This way, the attacker will not be able to figure out which word is hashed to 14231297424532579 with a dictionary attack - not without knowing the password.

Sign up to WorksHub to join our community of talented developers sharing insights and discovering opportunities

Example implementation

Here is an example (in Node.js - but can be ported to any language)

First, we import the crypto library (which Node provides out of the box):

const crypto = require('crypto');

Then, we write a convenience function to stretch a string; I use thethe sha512 algorithm, with a hundred thousand iterations - gives very good results without using too much CPU.

function stretchString(s, salt, outputLength){ return crypto.pbkdf2Sync(s, salt, 100000, outputLength, 'sha512'); }

This way, all we need to stretch a string is (besides the string itself), the salt to use and the amount of bytes we want for the output.

Then, using our ```stretchString``` function, we generate both our cryptographic key, and a very good salt to use for our blind indices, all from nothing but the password.

function keyFromPassword(password){ // We need 24 bytes for the key, and another 48 bytes for the salt const keyPlusHashingSalt = stretchString(password, 'salt', 24 + 48); return { cipherKey: keyPlusHashingSalt.slice(0,24), hashingSalt: keyPlusHashingSalt.slice(24) }; }

Now we can use the generated key to encrypt any data:

function encrypt(key, sourceData){ const iv = Buffer.alloc(16, 0); // Initialization vector const cipher = crypto.createCipheriv('aes-192-cbc', key.cipherKey, iv); let encrypted = cipher.update(sourceData, 'binary', 'binary'); encrypted += cipher.final('binary'); return encrypted; }

And then, using the same (symmetric) key, to decrypt it back:

function decrypt(key, encryptedData){ const iv = Buffer.alloc(16, 0); // Initialization vector const decipher = crypto.createDecipheriv('aes-192-cbc', key.cipherKey, iv); let decrypted = decipher.update(encryptedData, 'binary', 'binary'); decrypted += decipher.final('binary'); return decrypted; }

Now all we need is function to compute 64-bit hashes, to use for blind indexing:

function hash(key, sourceData){ const hashBuffer = stretchString(sourceData, key.hashingSalt, 8); return hashBuffer.readUIntLE(0,8); }

That's it! here is the full code:

const crypto = require('crypto'); // Uses the PBKDF2 algorithm to stretch the string 's' to an arbitrary size, // in a way that is completely deterministic yet impossible to guess without // knowing the original string function stretchString(s, salt, outputLength){ return crypto.pbkdf2Sync(s, salt, 100000, outputLength, 'sha512'); } // Stretches the password in order to generate a key (for encrypting) // and a large salt (for hashing) function keyFromPassword(password){ // We need 24 bytes for the key, and another 48 bytes for the salt const keyPlusHashingSalt = stretchString(password, 'salt', 24 + 48); return { cipherKey: keyPlusHashingSalt.slice(0,24), hashingSalt: keyPlusHashingSalt.slice(24) }; } // Encrypts data using the key generated using the 'keyFromPassword' function function encrypt(key, sourceData){ const iv = Buffer.alloc(16, 0); // Initialization vector const cipher = crypto.createCipheriv('aes-192-cbc', key.cipherKey, iv); let encrypted = cipher.update(sourceData, 'binary', 'binary'); encrypted += cipher.final('binary'); return encrypted; } // Decrypts data using the key generated using the 'keyFromPassword' function function decrypt(key, encryptedData){ const iv = Buffer.alloc(16, 0); // Initialization vector const decipher = crypto.createDecipheriv('aes-192-cbc', key.cipherKey, iv); let decrypted = decipher.update(encryptedData, 'binary', 'binary'); decrypted += decipher.final('binary'); return decrypted; } // Computes a unique (integer) hash from the given data, using the salt // we generated from the password (using 'keyFromPassword') function hash(key, sourceData){ const hashBuffer = stretchString(sourceData, key.hashingSalt, 8); return hashBuffer.readUIntLE(0,8); } const key = keyFromPassword('Our password'); const encryptedTest = encrypt(key, 'This is a test'); // prints 'This is a test', after encrypting it and decrypting it again console.log( decrypt(key, encryptedTest) ); // Prints the hash 14682136302485094000, generated from 'This is another test' console.log( hash(key, 'This is another test') );

Sign up to WorksHub to join our community of talented developers sharing insights and discovering opportunities