This article was revisited and updated in August 2018.

In the modern client-server applications, most of the sensitive data is stored (and consequently leaked) on the backend. At Cossack Labs, we’re working on different novel techniques for helping to protect the data within modern infrastructures. We talk to engineers across industries about these techniques quite a lot, too. However, it is still not uncommon to see infrastructures without even the basic classic database defence patterns.

In the next few posts, we’ll go through the classic and modern ideas of defensive database design.

Why do we protect the data on backend systems?

Web services and mobile applications provide convenient front-end mechanisms to access and manipulate the data stored in backend systems. Among that data are sensitive data assets (such as customers' personal information, identity, or access credentials), which typically constitute the greatest value for potential attackers. So, unsurprisingly, front-end applications become a target for adversaries seeking a way to gain access to back-end systems and the databases they contain.

It doesn't matter if you're using fancy NoSQL database with NodeJS front-end, old school LAMP, or corporate Oracle with Java. This article talks about design patterns and security decisions. Most modern client-server applications (web, mobile, or any user-focusing apps) can be presented in a similar architecture where front-end app could be an API server for a mobile app or Perl code rendering a web page:

Possible vectors of attack

For an attacker, there are 4 common ways to gain access to the data:

Altering the front-end behaviour in a such a way that allows the data to be extracted via the front-end application itself. This can be done via:

SQL injections (in combination with third attack vector, see below);

Gaining control of the app's execution flow;

Enumerating records in requests.

Sniffing traffic between the database and the front-end app to:

Collect the data requested by legitimate users;

Steal the credentials to access the database pretending to be a legitimate application.

This is typically done through getting into the internal network infrastructure and/or "rooting" one of the two hosts and silently listening in to the traffic if it’s not encrypted properly.

Altering the behavior of the database to bypass the access control in some way:

Pretending to be legitimate application/user;

Forcing the system to change access privileges;

Sending malformed requests from the app (in combination with the method 1 described above).

Stealing the assets from the files by accessing the database host directly and getting the database files at rest and extracting the meaningful data from them.

Classic tools for risk mitigation

There are four typical types of defence against the described threats:

Use a firewall to restrict the access to the database server.

Use authentication to restrict the access to data and compartmentalise databases within the DBMS to minimise the risk of lost credentials impacting every database.

Encrypt the critical columns/rows with a unified symmetric key.

Encrypt the whole partition containing the database files with a unique symmetric key.

Each of the defence approaches is strong against some of the attacks and each also has a number of problems:

Firewall:

Pros : A firewall helps to limit the proliferation of access within a network: it makes sure that only the trusted addresses gain access to the database ports.

Cons: However, often the path to system compromisation lies through some front-facing code which has legitimate rights to access the database. If the attackers are able to alter the behaviour of a legitimate app host (either by forcing it to execute something malicious or by gaining shell access with sufficient privileges), they do gain access to the database anyway.

Login/password authentication:

Pros : Authentication helps to protect against unauthorised access from parties, which don’t have the proper credentials. Authentication also allows enforcing certain access granularity: ensuring that only specific users can access a particular database.

Cons: Frequently the credentials for accessing the database are stored somewhere in the web app’s/middleware configuration files so they themselves can come a target for an attacker.

Selective row/column encryption:

Pros : Data is protected but … the encryption keys must be stored either in the backend or in the frontend.

Cons: The keys become a target. If an attacker gains access to the front-end host and the keys, mounting an attack from there is unproblematic.

Partition encryption:

Pros : If the storage devices are unmounted, there is no way to read the data — for example, stolen drives/servers, unauthorised access to the database server with system restart.

Cons: Supplying credentials when mounting the device can add maintenance/system administration complexity and clearly this does not provide any additional defense against the attacks on the device once mounted.

These techniques all provide relevant defence methods against particular types of attacks. But, as we’ve seen, they also open up the risk of different types of attack. Let's consider the types of security instrument that might address these risks.

If we accept that rows/cells/records containing sensitive (or any other kind of) data should be encrypted, the challenge turns into how to generate and securely manage the associated encryption keys. There are classic ways of doing that, including with HSMs, dedicated trust nodes with keys, and there are also some novel techniques that we will talk about in greater depth in the next article.

Looking again at the attack vectors for classic defence strategies, we see:

Infrastructure component Attack Classic defences Front-end application Alters the app's behavior to extract the data from the database. Database compartmentation: isolates the scope of the visible data down to the minimal amount required by functionality; uses authentication to minimise the leakage scope. Alters the app's behavior to run the code. Keeps keys away from the app's code. Front-end app host Steals DB credentials and executes code on the consumer's app host to access the main database. Database compartmentation; Database authentication with fine-grained access rights. Database host Physical access to the filesystem at rest. Encrypts partition. Physical access to DB files. Encrypts cells w/ an external key stored elsewhere. Database software SQL injection. Encrypts cells w/ external key, accessible via compartmented functionality with strong input sanitisation. Target network Unauthorised access to the database daemon. Firewalls; Passwords.

Most of the tools used are parts of the core database, application, and the OS infrastructure. By carefully creating the defence systems from these tools, you can eliminate some of the most common risks.

Need help in building secure distributed app?

Consult with our engineers. Let’s talk!

But what can go wrong?

As we’ve already noted, deploying these defences prevents many attacks, but they are hardly a problem for a sophisticated attacker with just a bit of luck. For example:

Available defence Attack trajectory Cell encryption w/ key on DB host compromises DB host seizes DB files + key store Cell encryption w/ key supplied in SQL compromises DB host seizes the key from the network traffic (fake proxy listener) and DB files from disk Cell encryption w/ key supplied in SQL compromises app host seizes the key from the network traffic or config downloads the encrypted data from DB and decrypts OR utilises the legitimate decryption code on the app's side Partition encryption Just ignores the physical server Login/password authentication compromises the app's host or alters the app behavior to seize the authentication data uses it from the app's host, bypasses any firewalls on the way. Firewall Just attacks the database through hosts, which have legitimate rights to access the database

Classic methods: the hardcore edition.

While this repertoire of classic defence techniques leaves open a range of theoretical attacks, nonetheless, they limit the opportunities for an attacker. This is even more so true if we harden our defences through:

Compartmentalising the data via database isolation and fine-grained rights

First and foremost, we need data isolation: limiting the table/database access via fine-grained right distribution on databases and table spaces. But sometimes it’s not the answer — database-level isolation hurts auto-sharding, DB management automation, and other modern scalability demands.

Fine-tuning access rights: more tips

Limiting the privileged database access (DBA roles) to addresses unrelated to the production servers and involving separate authentication mechanism (port-knocking is a good choice for some).

Controlling per-application access to open outbound connections to certain addresses.

Adding additional step of verification on the DB driver/pooler.

Adding IDS, HIDS, and monitoring

Monitor filesystem changes, suspicious fast-growing files, and log analysis for activities that may look like they are producing a DB dump. Monitor outgoing traffic for obvious signs of DB dumps. This will help to detect an ongoing leak if the attacker is lazy enough to create a dump with commodity tools and tries to download it as-is.

Encrypting the data

But, above all, data encryption is still possibly the most important instrument for preventing database leaks. If everything is broken and compromised, yet keys are kept safe — the stolen data will be of no use to the attackers without the keys.

Storage encryption

There is a number of tools that enable database file protection — either on file or filesystem level. They will prevent attackers from getting the data from outside the DBMS. Each approach certainly has its own drawbacks like performance penalties or maintenance inefficiency, but they help.

There’s also plenty of instrumentation within corporate database sector with classical trust isolation in some key management node or HSM and there are interesting novel tools we’ll talk about in the next article, too.

In-database record encryption

Every database has its own means of encrypting sensitive data, either by including a statement in SQL or by pointing at sensitive fields somewhere in the configuration. Typical examples include:

App-level records encryption

We believe that in most cases database encryption should occur on the application level, encrypting the data before sending it to the database and decrypting after receiving it. There are some reasons for that:

Minimisation of attack surface: minus one place with unencrypted data;

Storing keys together with data gives attackers less work to do;

Frequently, deriving trust from user-known secrets is a good pattern: the password the user inputs into your website is a nice secret to store user’s data, without user’s password your backend/front-end won’t be able to decrypt it at all.

App-level gives you more flexibility in the choise of cipher and an ability to pick strong and efficient ciphers/cipher modes

Apart from “just encrypting” the data, there are some other important techniques and considerations:

Context-aware encryption

One of the easiest things to do to complicate the decryption of stolen records is to bind their secret material to the context, i.e. data chunks, which are easily derivable from the environment, but are hard to reproduce if the data is stolen. For example, row numbers, automatically assigned keys or other kinds of data, used as encryption context, could theoretically be recovered from the database dump, but would take much more time and effort to figure out and would seriously complicate altering app's behaviour sometimes.

Split auth token scheme

Sometimes, encrypting the data doesn’t fit into the existing database scheme. Quite frequently the field length is the biggest problem, either because it’s predetermined and there is a lot of code to fix or the difficulties arise due to the max field length, and there is a lot of database layout to change.

Authentication tag (control information, which ensures that record was not tampered with) adds extra length to the encrypted string, thus creating a problem. It might make sense to store those separately and cryptographic design must allow that.

Not incidentally, Themis' Secure Cell allows implementing these techniques easily.

It is useful to remember that the goal of security is to make an attacker to give up on the attack, not 'achieve theoretical security in all possible cases' (https://twitter.com/mubix/status/745403991475904513):

Summary

Backend protection is very important. Even more so, it is crucial for your sensitive data. Through the use of classic techniques, we can prevent most typical risks and start building a base for a really solid security foundation in your product. To implement the security measures well, you need to:

Understand the risks and map them to your architecture; Understand which control mechanisms you've got and how reliable they are; Configure them to provide the best security guarantees for your data; Implement additional mechanisms if what's available out of the box is insufficient for you to sleep well.

By the way, we can help you with understanding what security measures you need and how to implement them best, just drop us a message or check out our Customer Success Program.

Coming next:

Managing secrets

Encryption and authentication both rely on secrets: passwords, keys, access tokens. If the system's implementation is proper, the system is as good as the key protection scheme. In the next article in the series, we'll talk in-depth about various strategies of managing your secrets and using them.

Modern techniques

Today, more and more database protection tools are emerging, both based on enhanced math and insights into applied cryptography. Further along in these series, we will talk about modern approaches and few technologies we’re developing at Cossack Labs.