uPort makes it easy for Ethereum developers to offer improved privacy to their users.

The attributes that make blockchain best for identity also make it worst for identity. The market is becoming flooded with “simple” identity options that crush the promise of user privacy on Ethereum.

At a time now more than ever, with privacy violating events dominating headlines around the world and GDPR coming into effect this month, developers should be conscious about how they are handling user’s data.

Developers should choose an identity provider that is simple, but one that values user privacy, and actively works to preserve this system value. Privacy-preserving systems like this require years of design and testing. uPort is proud to share its approach to user privacy on Ethereum here.

Warning! The Blockchain is Forever

The blockchain is public and permanent. While these are two generally desirable properties of blockchains, they provide a significant challenge to building a privacy-preserving user identity management system. We must ensure our system is not only secure and private today, but also decades into the future. Users cannot reasonably be expected to know best practices for how to manage their data, or what it means to store their information on the blockchain, so it’s up to us to protect them.

Modern decentralized identity systems utilize a credentials-based model of identity. This model expresses identity attributes as a collection of individual data credentials. Credentials can be used to cryptographically express things such as name, birthdate, membership, reputation, and even proof of being human. Thus the fundamental purpose of any decentralized identity provider is to handle the world’s most personal data.

When considering between various open source identity standards on Ethereum, developers should consider user privacy amongst their non-negotiable requirements. We consider it the most important facet of a self sovereign identity system, and a topic we have been thinking about for some time now.

Deciding between identity solutions? Ask two privacy questions.

Does it offer off-chain, in addition to on-chain, user data storage options? Does it minimize correlation risk for my users?

Ethereum Risk #1: On-Chain Data is a Permanent Target

You have to assume that blockchain identity systems that store users’ personal identity credentials on-chain are going to be targeted by malicious actors today and into the future. Yes, this is still a concern even if you encrypt the data and then store it on-chain. We think it’s a reasonable expectation to assume that computing power will increase to the point of being able to crack modern popular cryptography within a window of provider liability. This means that every piece of user data ever stored on-chain, will be publicly exposed for the world to see and act on. If this occurs, it could be disastrous for businesses and applications that promoted this negligent pattern.

Malicious actors with supercomputers will stripmine the blockchain for users’ PII. [image]

Ethereum Risk #2: On-Chain Actions are Correlatable

Public blockchain ledgers are available for all to read and analyze. This makes blockchains a very easy database for static analysis. The rise of machine learning and supercomputers have made it trivial to draw robust conclusions about the identity of an individual by correlating a few simple pieces of data that can be attributed to a common identity. Malicious actors can easily track your public data and public actions back to a common identity.

Correlation in identity systems occurs by tracking the actions or events taken by a single identity across the network, by tracking the publicly available data about that identity found on the network, and by looking to draw strong links between this identity and other identities on the network. This analysis technique can be used to estimate a user’s identity with a very high probability.

It is important to understand that correlation extends beyond simple user data. Instead, we should consider ways to minimize the correlation of a user’s on-chain smart contract interactions between different dapps, since looking at activity is a very easy way to correlate the identity of an individual.

To combat this very difficult problem inherent in public ledger-based decentralized systems, we need to design identity systems that reduce the number of data points that can be connected to each other through deep analysis of the public ledger.

To highlight this problem, let’s look at a simple example: Alice creates a MetaMask account and funds it with ETH. She logs into a prediction market and places a bet on a market. Then, she needs to use a government dapp to vote in a local political election with her blockchain identity. She logs in with her MetaMask and casts her vote. This creates an immediate problem for Alice. Unbeknownst to her, prediction markets are illegal in her country. Ignorant of the risks associated with using prediction markets, and lacking a proper understanding of blockchain technology, Alice has unknowingly exposed herself to the authorities. Because Alice voted on an illegal prediction market, and then used that same identity in her local election, she is exposed because her actions are extremely simple to correlate across these two dapps.

This point about simple correlation makes it extremely difficult to design blockchain-based systems that give users simple control over their identity, but that also protect their privacy and respect their right to be forgotten. Identity systems should strive to preserve user privacy, and by extension, combat correlation. Oh, and they need to be simple.