The End of Safe Harbor and a Scary Path Forward

In the Austrian internets the news about the end of the safe harbor act has been universally welcomed it seems. Especially from non technical folks that see this as a big win for their privacy. Surprisingly many technical people also welcomed this ruling. And hey, if Snowden says that's a good ruling, who will argue against.

I'm very torn about this issue because from a purely technical point of view it is very tricky to follow the ruling and by keeping to the current state of our data center environments in the light of some other rulings.

I'm as disappointed as everybody else that government agencies are operating above what seems reasonable from a privacy point of view, but we should be careful about what how this field develops. Fundamentally sharing information on the internet and the right to privacy stand in conflict to each other and the topic is a lot more complex than to just demand more privacy without considering what this means on a technical level.

What Was Safe Harbor? The US-EU Safe Harbor laws declared US soil as a safe location for user data to fulfill the European Privacy Directive. In a nutshell: this was the only reason any modern internet service could keep their primary user data in the United States on services like Amazon EC2 or Heroku. In essence Safe Harbor was a self assessment that an American company could sign to make itself subject to the European Data Protection Directive. At least in principle. Practically very few US companies cared about privacy which is probably a big reason why we ended up in this situation right now. The second one is the NSA surveillance but I want to cover this in particular separately a bit later.

What Changed? Maximillian Schrems, an Austrian citizen, has started an investigation into Facebook and its data deletion policies a while ago and been engaging with the Irish authorities on that matter ever since. The Irish rejected the complaint because they referred to the Safe Harbor act. What changed now is that the European Court of Justice ruled the following: In today’s judgment, the Court of Justice holds that the existence of a Commission decision finding that a third country ensures an adequate level of protection of the personal data transferred cannot eliminate or even reduce the powers available to the national supervisory authorities under the Charter of Fundamental Rights of the European Union and the directive. […] For all those reasons, the Court declares the Safe Harbour Decision invalid. This judgment has the consequence that the Irish supervisory authority is required to examine Mr Schrems’ complaint with all due diligence and, at the conclusion of its investigation, is to decide whether, pursuant to the directive, transfer of the data of Facebook’s European subscribers to the United States should be suspended on the ground that that country does not afford an adequate level of protection of personal data. The detailed ramifications of this are a bit unclear, but if you were relying on Safe Harbor so far, you probably have to move servers now.

Why Was Safe Harbor Useful? So if you take the internet three years ago (before the Ukrainian situation happened) the most common of legally running an international internet platform as a smallish startup was to put the servers somewhere in the US and fill out the safe harbor self assessment every 12 months. To understand why that was a common setup you need to consider why it was chosen in the first place. The European Data Protection Directive came into effect quite a long time ago. It's dated for the end of 1995 and required user data to be either stored in EFTA states or optionally in another country if it can be ensured that the same laws are upheld. This is what safe harbor did. In absence of this, all data from European citizens must be stored on European soil. After the Ukrainian upraising and after Crimea fell to the Russian Federation a few things changed. International sanctions were put up against Russia and Russia decided to adopt the same provision as the European Union: Russian citizen's data has to be stored on Russian servers. This time however without an option to get exceptions to this rule. It's true that the US do not yet have a provision that requires US citizen data to be stored in the States, but this is something that has been discussed in the past and it's a requirement for working with the government already. However with both Russia and Europe we now have two large international players that set the precedent and it can only get worse from here.

Privacy vs Data Control The core of the issue currently is that data is considered power and privacy is a secondary issue there. While upholding privacy is an important and necessary goal, we need to be careful to not forget that the European countries are not any better. While it's nice to blame the NSA for world wide surveillance programs, we Europeans have our own governmental agencies that act with very little supervision and especially in the UK operate on the same invasiveness as in the US. A European cloud provider will have to comply with local law enforcement just as much as an American cloud provider will have to be with federal US one. The main difference just being the institutions involved. The motivation for the Russian government is most likely related to law enforcement over privacy. I'm almost sure they care more about keeping certain power over companies doing business in Russia to protect themselves against international sanctions than their citizens privacy.

Data Locality and Personal Data So what exactly is the problem with storing European citizens data in Europe, data of Americans in the states and the data of Russians somewhere in the Russian Federation? Unsurprisingly this is a very hard problem to solve if you want to allow people from those different countries to interact with each other. Let's take a hypothetical startup here that wants to build some sort of Facebook for climbers. They have a very niche audience but they attract users from all over the world. Users of the platform can make international friendships, upload their climbing trips, exchange messages with each other and also purchase subscriptions for "pro" features like extra storage. So let's say we want to identify Russians, Americans and Europeans to keep the data local to each of their jurisdictions. The easy part is to set up some servers in all of those countries and make them talk to each other. The harder part is to figure out which user belongs to which jurisdiction. One way would be to make users upload their passport upon account creation and determine their main data center by their citizenship. This obviously would not cover dual citizens. A Russian-American might fall into two shards on a legal basis but they would only opt into one of them. So let's ignore those outliers. Let's also ignore what happens if the citizenship of a user changes because that process is quite involved and usually takes a few years and does not happen all that commonly. Now that we know where users are supposed to be stored, the question is how users are supposed to interact with each other. While distributed databases exist, they are not magic. Sending information from country to country takes a lot of time so operations that affect two users from different regions will involve quite a bit of delay. It also requires that the data temporarily crosses into another region. So if an American user sends data to a Russian user, that information will have to be processed somewhere. The problem however is if the information is not temporarily in flux. For instance sending a message from Russia to America could be seen as falling as being a duplicated message that is both intended for the American and Russian jurisdiction. Tricker it gets with information that cannot be directly correlated to a user. For instance what your friends are. Social relationships can only be modelled efficiently if the data is sufficiently local. We do not have magic in computing and we are bound to the laws of physics. If your friends are on the other side of the world (which nowadays the most likely are) it becomes impossible to handle. Credit card processing also falls in to this. Just because you are British does not mean your credit card is. Many people live in other countries and have many different bank accounts. The data inherently flows from system to system to clear the transaction. Our world is very connected nowadays and the concept of legal data locality is very much at odds with the realities of our world. The big cloud services are out, because they are predominantly placed in the US. Like it or not, Silicon Valley is many, many years ahead of what European companies can do. While there are some tiny cloud service providers in Europe, they barely go further than providing you with elastically priced hardware. For European startups this is a significant disadvantage over their American counterparts when they can no longer use American servers.

Privacy not Data Locality The case has been made that this discussion is not supposed to be about data locality but about privacy. That is correct for sure, but unfortunately data centers fall into the jurisdiction of where they are placed. Unless we come up with a rule where data centers are placed on international soil where they computers within them are out of government's reach, a lot of this privacy discussion is dishonest. What if the bad player are the corporates and now the governments? Well in that case that was the whole point of safe harbor to begin with: to enforce stricter privacy standards on foreign corporations for European citizens.

How to Comply? Now the question is how to comply with what this is going into. These new rules are more than implementable for Facebook size corporations, but it is incredibly hard to do for small startups. It's also not quite clear what can and what cannot be done with data now. At which point data is considered personal and at which point it is not, is something that differs from country to country and is in some situations even not entirely clear. For instance according to the UK DPA user relationships are personal information if they have "biographical significance".

A Disconnected World What worries me is that we are taking a huge step back from an interconnected world where people can share information with each other, to more and more incompatible decentralization. Computer games traditionally have already enforced shards where people from different countries could not play together because of legal reasons. For instance many of my Russian friends could never play a computer game with me, because they are forced to play in their own little online world. Solutions will be found, and this ruling will probably have no significance for the average user. Most likely companies will ignore the ruling entirely anyways because nobody is going to prosecute anyone unless they are Facebook size. However that decisions of this magnitude are made without considering the technical feasibility is problematic.