Let me start off with a quick trivia: Do you know that guy?

If the answer is “yes”, there are big chances that you are familiar with this quote:

With great power comes great responsibility — Uncle Ben

Note: I’m aware that the origin to this quote possibly reaches Voltaire, but hey, it’s not as catchy as Spider-man!

If you’re reading this, I guess you’re already familiar with Datomic and in love with the time-travel functionality. It’s just so powerful! But, as pointed above, that power comes at a cost. For instance — it makes GDPR compliance a bit complex.

Yeah I know, yet another time you’re reading about GDPR. We all have had it already, no rest for the wicked. But let’s remind ourselves some bullet points of the obligations it puts on our backs:

Protect PII (Personal Identifiable Information)

Permanently delete PII if their owner asks us to (right to be forgotten)

Audit PII accesses

There are, of course, more regulations. However, they overflow scope of this blogpost so I’ll leave them out of this.

Conflict of interest

So the problem we’re tackling here is pretty apparent: on one hand, we have the obligation to delete PII whenever their owner asks us to. On the other hand we have the power to travel back in time to the moment before that request.

Here, let me show you using a quick example:

On monday John Doe enters our webapp and registers himself:

@(d/transact

conn

[{:user/id 111

:user/name "John Doe"

:user/phone "123456789"}])

=>

{:db-before datomic.db.Db,

@5193c0be :db-after,

datomic.db.Db @43e943fe,

:tx-data [#datom[13194139534313 50 #inst"2018-06-06T13:07:58.664-00:00" 13194139534313 true]

#datom[17592186045418 63 111 13194139534313 true]

#datom[17592186045418 64 "John Doe" 13194139534313 true]

#datom[17592186045418 65 "123456789" 13194139534313 true]],

:tempids {-9223301668109598066 17592186045418}}

Let’s take a note of the transaction that pushed the new user to our database:

The enitity-id of that transaction is 13194139534313

That transaction happened at #inst”2018–06–06T13:07:58.664–00:00"

He plays with the app for a couple of days. Although on Friday he decides he’s not happy with the service so he asks us to delete his data. And so we comply:

@(d/transact

conn

[[:db.fn/retractEntity [:user/id 111]]])

=>

{:db-before datomic.db.Db,

@43e943fe :db-after,

datomic.db.Db @37505082,

:tx-data [#datom[13194139534315 50 #inst"2018-06-06T13:15:23.436-00:00" 13194139534315 true]

#datom[17592186045418 63 111 13194139534315 false]

#datom[17592186045418 64 "John Doe" 13194139534315 false]

#datom[17592186045418 65 "123456789" 13194139534315 false]],

:tempids {}}

We just make a final check to be sure everything is alright:

(->

(d/db conn)

(d/entity [:user/id 111]))

=> nil

Yay! John Doe is gone — we can get back to business, right? Well, no. Do you remember the notes we took about the transaction that created John Doe’s account? Let’s use it to travel back in time:

(->

(d/db conn)

(d/as-of 13194139534313)

(d/entity [:user/id 111]))

=> #:db{:id 17592186045418}

But hey, we’ve got your back! We’ve figured it out and we’ll be happy to share it with you.

If Plato was right…

Imagine Atlantis really existed. Also imagine they had enormous obelisk right in the center of their main square. And that that obelisk had engraved your very own phone number and favourite frosting from Dunkin Donuts. How GDPR-noncompliant is that? The answer is — it’s completely fine. The city disappeared into thin air (or water) and thus your secret craving for cookie monster donut is still safe between you and me (shhh!)

Just lock them up

Ok, but jokes aside. The algorithm for our problem is dead simple. We’ll leverage crypto-shredding for that purpose.

Create a key you will use to encrypt PII. You have to store that key in a way that will allow you to reliably delete it when needed. Use the key for encryption of the data on the way in and decryption on the way out. When the time comes, throw the key away.

The red block you see above depicts the structure when using Hashicorp Vault. It may, of course, vary depending on the tool you’ll choose.

So how do you throw a key away after all?

Now here you have many options to choose from. Like always, there’s not a one-size-fits-all solution but let me give you some ideas:

:db/excise — yes, you could just force Datomic to completely forget about that particular crypt key. But if you want to have your data properly encrypted, then you can’t keep the key in the same place you store the PII.

— yes, you could just force Datomic to completely forget about that particular crypt key. But if you want to have your data properly encrypted, then you can’t keep the key in the same place you store the PII. Separate database with delete capabilities — just two columns: user-id and crypt-key. Simple as that. But that puts a lot of effort on your back — you have to maintain that DB, monitor its health and there still remains an obligation of auditing the accesses.

Hashicorp Vault — it’s a very powerful tool that perfectly suits our problem. It has deeply configurable access policies, audit, partitioning of secrets, high availability mode etc. However, to get all those gears spinning it takes a lot of configuration and resources. The official tutorial for high availability mode using AWS mentions 8 instances! So if the scale of your problem justifies the investment of effort into Vault — go for it!

AWS solutions — now there are at least three products that could be used in this problem — Key Management Service, Secrets Manager and Systems Manager Parameter Store

We decided to go for Parameter Store for its pricing (it’s hard to beat free). There’s some criticism about it due to the request limits so sooner or later we’ll need to migrate for something more performant. That’s why it might be good idea to put it behind a boundary. That should help to migrate with relative ease.

But what if…

What if I get a hunch that my data is in danger?

Then you want to rotate your encryption keys. Depending on the solution you have chosen this might be a solution offered by design. Otherwise you just want to decrypt your data, create new key, encrypt it back again with the new key this time, store the data in Datomic and delete the old key.

What if the volume of PII is too big to effectively rotate keys?

If a lot of the data you gather is considered PII (medical data for one) then you might have a problem to decrypt everything, keep it in memory and mingle with it. Then you might consider introducing an extra step to encryption: instead of having one crypt key you will have a master key and an ephemeral key. You use the former to encrypt the data with, but you never store it in unencrypted form. You encrypt the master key using the ephemeral key. Then, whenever you want to rotate your ephemeral key, you repeat the process from the point above but only decrypting and re-encrypting the master key.

What if I have to delete some part of PII now and some later?

Imagine you’re a bookkeeper. If your client asks you to delete his PII you have to comply and delete it ASAP. But part of that data will be necessary in case of tax investigations. In such case you may want to split your PII into multiple levels. Each one would have separate crypt key. The rest works as in points mentioned above.

And that sums it up…

There’s one thing you need to keep in mind though: products never are and never will be GDPR-compliant. It’s the companies that need to be respectful to PII. Products can only make that habit easier or harder.

If you spot any flaw in our thinking, please let us know. If you have any other comments, also please let us know.

Up next: I’ll write a post with a more in-depth example of using Parameter Store in our webapps. Tune up!