It isn’t easy to see through the tangled mess of identity, although the search for it and the politics surrounding it is at the centre of so much of the modern world, from #BlackLivesMatter through to Edward Snowden. The ongoing search for identity systems that feel like who we feel we are pervades social networks (Facebook’s Real Names, Twitter’s Verified), but nobody seems to know how to define us in ways that work with how we define ourselves. Is it possible to do better and give us an identity that works? Part I. Are you sure you exist? We are!

Part II. The Art of the Possible: What can we describe?

Part III. Identity we can (all) live with

Part I. Are you sure you exist? We are!

More than a couple of times a year, I meet teams working on identity.

I always make myself immediately unpopular by asking them “what is identity?” and, of course, everybody acts as if it is completely obvious from our mutual shared context, and I should already know. Usually when people use the word “identity” their implicit definition falls into one of following buckets.

Identity is your physical body, as recorded in a database using a biometric record or a biometric hash.

Identity is a government record which gives you a unique identifier like a taxpayer ID number or a social security number.

Identity is a set of attributes attested to by a community, attached to a self-chosen name. This is often used interchangeably with the word “reputation” and may assume that a single individual has many identities, frequently called personas or nyms.

Identity is a username/password combination, where your ability to remember some fact records that you are the same person now as you were when you opened the account — essentially this is proof of memory.

Identity is a cryptographic key pair, where one is identified by the public key, and this identity is proven using the private key (usually by signing things.) SSH keys are a good example of this in its rawest form: you log in once and add a key, and all the system cares about is that you use the same key to get back in — not who you are!

Identity is a persistent sense of self or selfhood, or a semi-continuous narrative identity we call “I.”

Identity is some kind of metaphysical construct, a “true self” or soul, which shines out through the personality and into all of our actions.

The case that I’m going to make is that the identity debate is magnetically drawn to two opposing poles:

Humans as things identified by their bodies, vs

Humans as intangible, ever-shifting narrative beings.

Our challenge, in creating software to enable people to use or manifest their identity, is that to do either job well results in problems. On one hand, a world in which people are things and are serialized and tracked like all other things. On the other hand, a liquid world in which defrauding Grandma on eBay never, ever feeds back to a person’s dating profile, even after convictions.

Our job, as software engineers working on identity, is first to be philosophers — phenomenologists and epistemologists — to understand the abstraction that we hope to represent.

What is a person, and how do I identify one to a computer?

The Enlightenment Intensive — a quick experiment in identity

Charles and Ava Berner distilled a variety of Eastern and Western tradition sources into a practice called the Enlightenment Intensive. The procedure is pretty simple — rows of people facing each other, taking turns following the instruction “tell me who you are.” This sounds simple enough, but 10 hours a day for three or four days rapidly shakes many people’s faith that they understand themselves. Many experience a sudden break with their usual flow of life, perhaps along the line of what Zen practitioners would term “satori.” I’m not formally trained in either the EI process or Zen, so I’m not making a direct equivalence here, nor am I going to delve too much further into the technical details of the practice.

What I want to point out is that identity is a deeply secret and mysterious thing for us, and staring right at it for long periods of time is the lifestyle preference of poets, saints, and madmen. I’ve never read a single piece on identity in a computer science context — much less a business context — which fully spoke to the mysteriousness of identity, so I thought I’d write this one.

Nobody comes out of an Enlightenment Intensive with a pat* answer. Temporary answers like “I am Pat’s mother” can be factually true, but when examined in a lot of detail, according to many people, these descriptions seem to miss the core of a person’s identity. They’re things we do, or conditions we experience, relationships we participate in, but the sum of all the roles we play seems to fall short of describing the totality of our identity. Even when all the parts are put together, there is something indescribable.

There is something in the whole which is beyond description.

So I want to start by problematizing identity. It seems that whatever toolkit we approach identity with — art, philosophy, psychology, esoteric yoga, whatever it happens to be for any given person — there is no simple, clear, pat answer which holds true for a lifetime. For people that do “find themselves” (whatever that means) the result may be a higher quality of life and a less turbulent mind, less self-doubt or similar benefits, but they never quite seem to be able to explain what it is that they found, much less how a third party might find it.

You probably don’t know who you are. And if you do, you probably can’t tell me about it in a way that communicates it to me.

And we’re supposed to write software about this???!

The turbulence of the every-day: Things are a lot weirder than you think.

So, for now, let’s assume that software isn’t going to help us much with metaphysical or deep psychological models of our identity.

What are the areas where software is doing a good job of describing identity?

Let’s take a look at four cases where identity seems to intersect software in useful and predictable ways — not strange one-offs or art projects, but the meat-and-potatoes software of everyday life.

1. Credit Ratings

Credit rating agencies sum up huge amounts of information about your financial (and perhaps social) behavior, and summarize it down into a number which estimates the risk that people with the same profile as you will/won’t default on a loan (more or less.).

Credit rating agencies, though, can only estimate your risk of default based on the previous actions of people “like you.”. They don’t have access to your interiority, they can’t see into your priorities except by knowing your previous behavior. The art is inexact, although it pretends to be an exact science. The notion of “likeness” is a cultural construct, grounded in an Enlightenment idea of knowledge as the discovery of natural laws and naturally occurring kinds, from which came the Romantic fascination with “Monsters,” things which defy categorisation.

Post 1980, sociologists of knowledge like John Law have written about a “Sociology of Monsters.” Computer scientists have responded by adding more data points to their models, but in every model of “people like you,” every data point is a cultural decision about whether, for this person, the value is True or False.

So here we must ask: is information about “people like you” also information about you, or is it simply an accumulated stereotype? Either way, these systems work for millions of transactions every day and for most of us, credit rating is an important part of our portfolio of identity attributes.

2. Social Networks

Social networks also have a piece of our heart: a map of who we read, write messages to, and perhaps consider ourselves kin to.

A personal note: I have about 100k tweets on twitter. I’ve done an enormous amount of long-form thinking on twitter, had conversations that shaped first my perspective, then my life. More importantly, I have regular correspondents that — as far as I am concerned — only exist on twitter. There’s simply no sense that these people exist for me, except as inch square icons and glittering fragments of insight. Several are completely pseudonymous, and quite secretive with it. Ghostly, informative presences. Spirits.

My identity is partly defined by my microsociety. Once I was a student of esoteric yoga: there are conversations I can’t have, sides of my personality that effectively do not exist outside of those circles. I can’t tell you about fascism and rakshasas, say. Similarly, there’s a particular kind of techno-political discourse which exists, in my life, only on twitter. There’s simply no other way to have those conversations, and there are things I will simply never hear myself say except as part of a discussion with those little square faces. This document is as it is because many of those little square faces came to help — piled in with suggestions, hyperlinks, valuable advice. I am sometimes a collective.

This is an intersubjectivity, a thing which exists only between people, but because it exists inside of a platform that I have very little control of, it feels like a piece of my identity which is held hostage and at risk: if Twitter goes away, a piece of my political voice ceases to exist — because we cannot intuitively speak what we know nobody can hear. Without twitter, where are my helpers, my community of kindred spirits trying to cooperatively point the way forwards?

For other people, the persons in the “exists only on the internet” category might be distant, hard-to-trace family members — people you know online, but will never, in the natural flow of things, meet or get an email address for. Those feral cousins that moved to Australia 35 years ago, or granny’s best friend who used to keep her company when she babysat for you when you were tiny — these people only exist for us within the enriched compost of Facebook. Facebook effectively owns these relationships.

If we are, in some sense, our relationships, and our relationships (at least some of them) only exist on Facebook, then something very odd indeed is happening to our identity.

3. Names

Do you own your own domain name, a site built around a personal brand — perhaps your name, or a handle which is realer-than-your-name for most of the people who know you? I know a few people who’re mostly their handles online — Documentally is a particularly good example, in that people use his given name with a certain strangeness, a sense of referring to something not quite for public use. The man and the brand are interpenetrated.

Then there are people with projects which have become synonymous with their work: “Linus” doesn’t refer just to an engineer, it refers to the King of the Nerds. “Linus says” is not a mid-40s engineer speaking, but a structural power role — a man so identified with his leadership position that his first name is his title is his role is his identity.

I’m sure “Linus” means something completely different to his family and personal friends than the “Linus says” power. I’m sure this was also true of Elvis at one point, but I think that online the identity infrastructure which grows up, out and around people through things like project management web sites (Slack, GitHub etc.) has names for people — identities — which combine personal identity, personal brand, and structural power in fascinating, unique and illegible ways. “Linus says” is the power brand that built most of the machines that run or access the internet these days (android phones… Linus says!)

These identities are really important. A collection of identities more or less like Linus more or less run the internet — systems like IETF (Internet Engineering Task Force) probably could not function without the tools that support these blended representations of identity and authority.

4. Surveillance databases

Surveillance databases also store an account of who we are. Although we never see or hear about what these databases know about us, it’s pretty rational to think that one of the most complete pictures of our identity resides on anonymous government computers, all watched over by machines of loving grace, or at least their national security oriented predecessors.

What’s interesting about this version of our identity, apart from its completeness, is that we — the subject — cannot see it. We can infer from the Snowden revelations that the national security databases have access to most or all of our social media, web surfing habits, financial and health records and more, woven into an integrated model of our likelihood of going nuts and driving a truck load of explosives into an oil terminal.

But this is an identity we are not privy to, evaluated (much like our credit history, I suppose) but screened from our awareness. This is an identity not made of what we know, but of what others know about us.

In fact, Google’s advertising databases have most of the attributes of these hypothetical surveillance databases — a fine-grained behavioral model of our propensity to buy things, with vast access to our email history, search records and far, far more. Perhaps it’s not so coincidental that these databases seem to have some similar properties.

I think here we have to frame three very deep questions:

a. Is information about you, which you have no access or control over, part of your identity? Or is that information somebody else’s story?

This question gets accessed from a lot of different directions, with many different impacts, e.g. the EU’s “right to be forgotten” directives.

b. How does unconscious, tacit knowledge about us and our behavior factor into our identity? There are lots of things that we do not know about ourselves, but that others know about us, that are clearly parts of our identity from their perspective, but entirely outside of our awareness.

A classic example might be a card player’s “tell” — an unconscious behavior that gives away a weak or strong hand, like “you always tug at your sleeves when you’re going to fold.” Gait analysis is like this; our walk leaks our uniqueness.

These behaviors, of course, are no different from our conscious behaviors to a machine. This may have some interesting implications for hard-to-fake ways of identifying people in the future.

c. The algorithms used to cluster people by these kinds of institutions actually create large parts of the “identity” which is being stored about us. You might think your identity is “German, information architect, classical musician, mother” but the algorithms know you as a financial risk taker who is immune to impulse purchases, while another set of algorithms know you as a political moderate with a radical past who is unlikely to be a threat to national security.

I hope, by now, that I’ve entirely convinced you that identity is a field filled with half-understood categories, half-baked theories which none-the-less are being used by enormously powerful organizations every day to decide questions like “can I have a mortgage, please?”

I want to posit that there is simply no solution to these questions without building out a really extensive new theory of identity, and that such a theory is going to have to compromise with the philosophical abyss presented by the gap between who we think and feel we are, and the identities that the world generates for us on a more-or-less ad-hoc basis, without our say so or so much as a by-your-leave other than, perhaps, accepting an occasional EULA.

It’s a mess.

If we think the identity debate is purely ontological, if we share the Enlightenment faith in Natural Laws and Natuallly Occuring Kinds, damn right it’s a mess. The world looks unknowable.

However, if we understand that it’s a set of narratives that are rely on culture to be effective, we can talk about what narratives are useful and reliable, and try and construct some purpose built narratives that help us relate to strangers and make the queues at airport immigration move more quickly.

So I want to start over, from the other end of the equation: what’s the simple, easy, clear stuff in the identity space?

Part II. The Art of the Possible: What can we describe?

The simplest form of identity we have is our role in our family when we are babies. We don’t know our name, and in some cultures we might not even have a name, but everybody in our family knows who we are, and how we relate to them and the environment.

At this stage, assuming we are healthy, we have very few attributes. Maybe some early behavioral preferences, like sleeping or not sleeping, and a little biological variance like how our hair is. But there’s not much to us at this stage: no biography, no (visible?) hopes and dreams. We’re mainly made of potential, and the story of our lives is largely told by other people.

At this point, there’s no doubt, identity is what other people make around us. Let’s think about the identities a healthy baby might have.

There’ll be a very well-defined role inside of a family, almost certainly.

There’ll be a government-issued birth certificate, probably — not in all cultures, but in most of the nations with strong central governments.

Now, at this point, the near-absence of attributes of a baby is really worth noting. There’s just not much you can add: height, weight, ok. But no degrees, no employment history, no credit card debt — all that comes later!

Of course, in a fully high tech medicine society, there will be pre-natal test results, ultrasound and a whole pile of other medical information, but for the sake of argument, let’s imagine we’re dealing with a slightly lower tech culture — us as we were a few decades ago, or a poorer country where, if everything is going well, there isn’t much medical involvement in pregnancy. This is a slightly artificial simplification, but I think it really will help. Go with me on this! So we have our state-of-nature healthy baby without much of a medical history. Let’s now introduce the medical scenario and see what happens to baby.

Let’s say the poor kid is a few weeks premature, and a bit unwell. Hospital is involved. Now we have a case history.

A hospital number

Relationships with attending physicians

An incubator number

A medical profile, with things like weight each day, diet

A medical history with a chronology of events, opinions and options

In countries without universal health care, insurance policy information — who’s paying for this care?

Suddenly our attribute-free happy child becomes wrapped up in an entire world of data, identity, relationship and rights. And, let us note, none of this is endogenous — our baby has not generated all of these relationships, data and identities by will. Perhaps the little tyke had a rough outline of a medical history wrapped up in midwife and mother’s head, but when all is well, that data is soft and passing. It may be recorded, but it is scarcely processed. But now, in even a minor, routine crisis, the entire charting-and-recording apparatus spins up and it becomes urgent to build a medical identity for a child that previously probably had only the most superficial medical history.

Particularly in countries where families pay for their own medical care, and medical care for their kids, there is a sudden urgency here: who is to pay, and if nobody is going to pay, who is liable for covering the legal care that must, by law, be given to the child? This is no joke: if the situation was complex, and the care expensive, the price of a couple of houses could be spent saving a child’s life. So a lot of what is being established in the hard case of paying for your own health care is a set of entitlements that one is contractually bound to receive having paid health insurance premiums. Identity: “are you the child of a person who has paid their premiums?” turns into life rather than death in our worst case scenario.

Now, I want you to pause a second and think.

We’ve established in the first section of this essay that at a very fundamental level, identity seems to be philosophically hard to the point of near-insolubility. It’s just not that clear that the commonly used concept of “identity” neatly maps to anything which stands up to the full light of clear, conscious scrutiny. And that’s OK, we have quite a few concepts like that (try “love” next) which are at the core of our lives — we know it when we see it, but all written definitions seem to fall short of the mark.

Then we examine some actually-existing but still very, very odd forms of identity — social network, project leadership, and the ever-present surveillance marketing databases. All very dark glass, hard to see through — things are not defined.

Then in a medical context, we fall bang into the middle of the hardest possible kinds of identity: we know who you are, because we paid for you to be born in this hospital, and they gave you a tag 30 seconds after you took your first breath. And in the world where people have excellent medical care, that medical identity forms not that long after conception — medical records dating before your birth. That’s a trend likely to continue and become ever-more important as we discover more about epigenetics: your DNA and your gene activation continuously mapped from the first signs of pregnancy through to post-birth.

So this is my first concrete observation: philosophically and intellectually unsatisfying as “identity” might be as a concept there is absolutely no denying that in a variety of critical places, the current identity infrastructure works.

It is not always ideal. To access a patient’s records when they are (for example) unconscious in another country is an absolute nightmare, and there are many less rare scenarios which still throw up plenty of obstacles between a doctor and their patient’s data. But there’s no way around the basic conclusion that in the core cases, medical identity basically holds together.

But gods of bureaucracy help you if you get trapped in an edge case with your life on the line.

Usernames and passwords

In the 1980s, if you had internet access, it was usually through a university. The university had a pretty good hold on your identity, and usually the “chain of custody” for your identity would go something like:

1. Government ID number on university enrollment forms

2. University ID number

3. Forms indicating you were expected to have internet access

4. A system administrator creates an account for you

5. You’re issued a username and password

So the username and password you had were, in the final analysis, backed by a long chain of accountability which went all the way back to the government at the head end of the chain of custody of your identity.

There were probably other ways that the internet could have been organized than username and password. The username/password standard had been set years before, in the earliest days of the minicomputer era. But once those standards and those expectations had been set, path dependence sets in, and everything tends to look a little like what came before because if it does not, the users can’t navigate.

And in the old days, the 1980s and 1990s, those system administrators acted very much like town mayors or other local government. If you were being a pest on the internet, your identity was trackable: bjr1991@cs.utah.edu went back to the Computer Science system administrator at Utah State or wherever it referred to, and he’d figure out who you were from your username, and then fire you off an email that threatened either your grades, your tuition or your life depending on the degree of infraction!

Eternal September

This system of accountability lasted until The September Which Never Ended where new actors entered the space (in this case, AOL) who were paid, not by the federal government or by the universities, but by the users themselves. AOL had very, very many users per administrator, and no particular interest in kicking paying customers off the internet for abuse.

The result of this breakdown in accountability (a breakdown in the personal relationships between individuals using and providing internet access) was a deep and near-permanent change in the lived experience and culture of the internet. Identity, governance and accountability are beyond scope for this article, but do remember this story: it will come back over and over again as the internet progresses and new media like virtual reality or biometrics get added to the systems we currently have acclimatized to and take for granted.

It’s remarkable to think that the humble, ordinary username and password used to be connected to such a fundamental hierarchical chain of accountability, but that’s how it used to be.

Let’s compare that to today’s username/password combination.

A typical flow might be that I go to a web service, give it a username and password, and usually an email address which might come from (say) a one-use temporary email address provider. The owners of this web service now have no way of reaching me other than in-app messaging, and even if they can reach me, they have absolutely no idea of who I am. Therefore there’s no reasonable approach to managing abuse on these services, other than (say) tracing the IP address I connect with and going after me at the internet service provider level.

This is, of course, rather an analogue to the old 1980s internet norms: in those days, the university system administrators were essentially the internet service providers. You could even argue that the IP address is providing much the same kind of accountability. But, in practice, there are so many users and so few administrators, and such unwillingness to go after abuse that — unless the matter is criminal, and in many cases, even if it is criminal — there’s simply nobody to go to.

The same credentials — a username and a password — have gone from being a more or less absolute guarantee of good conduct based on a rigid chain of accountability going from your academic institution right up to the nation state level, to indicating almost nothing other than you are the same person that opened the account yesterday, or their chosen representative.

Identity is slippery, and the rules keep changing. In a five year period around 1993, internet access and the username/password combo went from being a near-guarantee of good conduct because it was embedded in a comprehensive and federated model of governance, to being open to everybody in the fruitful anarchy we know today, ridden as it is with spam, fraud and aggressive idiocy.

The old norms were buried and spaded under in a season. I was there, and I am lucky and glad to have seen it.

Identifying Machines

Around the same time this was going on, corresponding work was being done to make it easier and easier for the machines to talk to each other. Every machine became outfitted with a “MAC address” if it wanted to connect to the network, with this address built into the hardware used to connect to the network (in those days, ethernet cards: this was the 1980s.) Vendors each got a range of the MAC number range, and it was up to them to not issue duplicate hardware.

However, mac addresses are not cryptographic objects — they are just numbers. This was OK before money started to change hands on the network, but once money starts to flow, keypairs (a public and private key, from strong cryptography) starts to be vital.

Enter the mobile phone SIM card, which has a globally unique IMEI number to identify it, and also a keypair on the SIM which is used for billing. Identifying which phone — and therefore which contract — made the call is done by identifying the SIM in use through its keypair.

This system has proven to be remarkably stable. Mac addresses are sometimes changed by software (even for privacy reasons) but nothing much changes when this happens; perhaps an occasional machine gets identified as being something it is not, but MACs are not used for routing traffic etc. usually. The internet address which machines get when they connect to a local network is another issue all together: a centralized, top-down numbering system in which the world is divided up into zones, each getting a number range to put machines into. This top down system operates all the way down to your router when it issues you an IP address: that address has come from the IANA (Internet Assigned Numbers Authority) to your nation state or network provider, to your router, to you.

If this sounds very baroque — bureaucracy straight from the Brazil film — it works because all the companies making hardware, selling internet connections, and managing these numbers simply have to make it work to get paid. Their common interest is that these systems just work and stay out of everybody’s hair, and as a result cooperation to resolve problems (including through standardization) is usually swift, spontaneous and effective. (There may be some chortling from people who are intimately involved in IANA etc. All I can say is “compared to the climate process.”)

Part of what makes this possible is that there is very little commercial competition over these numbers: one device per number, but nobody (generally speaking) buys/sells/speculates on these numbers. They are anonymous and unimportant. In the few instances where scarcity has resulted in arbitrage opportunities, usually the outcry at crass commercialism taking precedence over the structure of the network and its function is immediate.

Note this is entirely different to the domain name system, where there is massive political contest over who gets to sell which domain names, and over the domains themselves as people scramble to find a domain name that suits their purposes (alas, poor http://leashless.org). There’s little commercial advantage to having the right number for a device; a number is just a number. Much the same kinds of systems dole out credit card numbers to the banks, and that all works out just fine, for the most part.

So in the land of machines, things work basically smoothly. All the trouble in the world gathers around DNS and HTTPS, and we’ll look into that below when we discuss certificate authorities and their problems.

Back to humans.

Biometrics — nailing people to the meat

So far we’ve examined a few different ways that identities are acquired, mostly defaulting to some long variant on the state hands you out an identity number and it is used to back things like hospital or university records.

The biometric has an appealing quality, in that it is not necessarily tightly associated with a State for its issuance.

So let’s start with the base case: a simple facial biometric. I take some clear pictures of my face. A computer measures things like how long my nose is, or the distance between my pupils. These things are hard to change, and (if we measure enough points) pretty unique. Anybody who matches all these facial biometric features is going to be more similar looking to me than even an identical twin might be (in many cases.)

In theory this is a pretty good way of identifying somebody.

In practice, though, often we want to be able to identify ourself in different ways to different people. Suppose I am a closeted gay member of a church which is rather hard on gay people. If I use the same ID process, involving my “you only get one” face, as identity for both my church and my gay dating sites, I’m wide open to blackmail and exposure. An enterprising hacker can match my identity information across a couple of different databases, come to the conclusion I’m a closeted gay man, and then extort me.

This may seem far fetched, and obviously something that could be done much more easily using my computer’s IP address: connects from the same address, two separate lives, obviously there’s an exploitable person here. This is an excellent reason why people with things to hide should use tools like Tor to obscure their IP addresses, and why websites should not keep logs which could be captured by a hacker and correlated to other sources of data. The recent Ashley Madison hack exposed an enormous number of people living double lives, because personally identifying information was spattered all the way through their inadequately protected databases. I don’t know how many lives were destroyed in that hack, but the answer is certainly “more than none” and we should be acutely aware of the risk presented by personal data in sensitive areas of people’s lives.

So let’s go back to the biometric example. Suppose that I authenticate to a web site by waving my face into a phone camera. This seems pretty reasonable: in fact, my Samsung phone can use my face to unlock itself. I wave the phone in front of my face for a fraction of a second, and it unlocks. I’m old enough that this seems rather magical, particularly given that it’s reasonably reliable and precise. Could similar mechanisms be used to log me into Facebook?

The answer is obviously yes, but here’s the problem: somebody can make a fake version of my head that meets all the biometric criteria required, and now can log in as me.

I realize this sounds absurd, but what’s required? Some high resolution pictures, maybe a 3D printer, and some experience making latex moulds / rubber heads. Certainly it seems unlikely anybody’s going to do this for me… but suppose I was an elderly rich person of the type so often targeted by unscrupulous criminal gangs for fraud, intending to hit scared old people right in their retirement savings. Of course, many of these old people have weak passwords — in general, in fact, people have incredibly weak passwords. So, once again, as with IP addresses leaking huge amounts of personal data, we must remember that existing practice has horrible privacy implications already. The hypothetical flaws I’m discussing with biometrics are certainly matched by equally horrific flaws in our current identity infrastructures, as I’ll discuss further.

So with Grandma’s nest egg in play, our unscrupulous gang takes some pictures, makes a fake head, logs into her bank account, and wires the whole amount to a bitcoin ATM in a far away country, never to be seen again. Using a rubber head to hack somebody’s identity may seem utterly bizarre, but these things are already happening at a lesser scale. Consider this stunning story of a group of Brazilian doctors manufacturing fake fingers so they could log each-other in at work.

Now, I want you to think about this carefully, relative to the username / password situation.

With a username / password, something I (and only I) know is what identifies me. In theory I can keep this information in my head, so short of using FMRI to pull the secret out of my head, there’s no way for a thief to steal my username/password combo. But if there’s some kind of technical attack on my computer, where my password gets copied by some sneaky piece of software as I type it in, my identity is compromised.

And I can change a password. The biggest critical problem with using raw biometrics for identification is that I cannot change my biometrics. If the vital data leaks, I’m compromised for the rest of my life!

DNA: The identity you cannot hide

DNA is, at least on the current scientific knowledge, a seductively perfect biometric: a unique sequence of digital data, one CD worth per human, accessible from a swab (you don’t even need a blood sample). Unchanging through a lifetime, with the ability to read the DNA sequence from a sample getting cheaper, easier and more reliable every year. We can even tell the difference between “identical” twins now. And you get lots of additional information from DNA about useful things like family structure — who parents, siblings, cousins, grandparents and all the rest are, pretty much 100% reliably baked into the deal.

DNA is great. If it was only private, it would be perfect.

The rub with DNA is that it gets everywhere that cat hair gets, and further. If you’ve ever owned a fluffy white cat, or a golden retriever with a tendency to shed, you know what I’m talking about. You’re at a friend’s house, and you see their nice black jacket, and there’s one of your cat/dog hairs stuck on the arm. They’ve never even worn the jacket to your house (they aren’t daft!) — and yet there’s no limit to the reach of your dog hair.

DNA from shed skin cells is even worse. Such tiny quantities of DNA tend to get mixed in with everything else, but the techniques continue to improve. Sequencing is faster and faster, and techniques for rejecting environmental contaminants grow ever more sophisticated.

In the limit, perhaps in twenty or thirty years (as far forwards as the internet is looking back) it might be possible to discover everybody who had set foot inside of a shopping center by sweeping-and-sequencing the dust on the floors. You might well get next of kin and coworker data for a lot of them too. If there is a DNA database to turn such incidental biological traces into identity information then privacy as we currently know it is a thing of the past. Obviously political use could be made of such technology too — unless people are going to start attending riots in drysuits!

Over and over, with biometric information, we are going to discover that the data itself is harmless, but the ability to compare the data with data from other sources is dangerous. Imagine that I am a small shopkeeper with a smart camera that I use for keeping people who have behaved inappropriately out of my shop. There’s not much that can go wrong here: I have 5, 10, maybe 20 people a year on my camera’s memory of miscreants, and if they come into my shop again, it beeps. I could remember the faces myself, but how do I teach my staff to recognize the people I’ve banned? Probably easier to have a machine do it.

If instead that was a futuristic DNA system, it is no more or less dangerous than a recognition camera. These things, in and of themselves, are not dangerous. Something that aids my memory is not, in itself, bad.

But networked it is a whole different story. If my “shoplifter memory” system analyzes the DNA data of everybody coming into my shop, I could well know more about people’s familial relationships than they do. “That’s not your brother!”

Likewise if I correlate the DNA samples from my shop floor with medical databases: now I know your genetic predispositions to a range of diseases. If science continues to make strong advances in mapping genes to risks this information could be quite valuable — mapping out insurable risks for a health insurer, say. Again, it’s not the data which is dangerous, but the correlation of the data.

Finally, what if I as the shopkeeper network my shop’s DNA scanner output with everybody elses? Now we can track shoplifters, sure. But we also have to contend with cantankerous shopkeepers who put their former employees into the database as thieves by accident, or otherwise contaminate the “reputation” database associated with each piece of DNA that the network has stored with false information of some other kind. Suddenly we have an enormous capability to identify people, far exceeding our ability to justly and prudently profile them based on their previous behaviour. We cannot actually manage the reputation databases all that well — scandal, hearsay and gossip penetrate into people’s record far too easily — and if these systems have something resembling the weight of law (at least common law) in everyday life, there are clearly huge consequences to even the occasional accidental false report, or intentional and malicious contamination of the history data.

The better able we are able to identify people, the more certain we must be that the identity databases which back up those conclusions are flawless.

I don’t think that’s going to be an easy job at all.

DNA as a Template for our Being

While DNA gives us perhaps the best imaginable way to distinguish two individuals from each-other in the normal course of day-to-day business, but in the process it reveals the exact genetic composition of their body — essentially the “source code” to their biology! We know that an awful lot of very private, very sensitive data is revealed in this template: propensity to cancer, diabetes, depression, resistance (or not) to HIV and many other medical factors are encoded.

There is reason to believe that temperament may also have at least some genetic factors: heritability of temperament is still an open question (nature, nurture, epigentics and more have to be weighed) but, because we do not know what we do not know, it is probably safer to assume that there might be huge insights into human personality and general cognitive function from the genetic level and plan accordingly: to defend our genetic privacy against all attempts to turn our DNA into a casual identity validation mechanism. But, of course, we are still faced with the problem of leaking our DNA in every physical environment we set foot in.

I strongly suspect that the paradoxical nature of DNA — as an identifier, but also as deep insight into the biology and possibly psychology of the person identified — is going to be a defining quandary of the 21st century. As our tools rapidly evolve to allow us to use DNA, can business, government and society keep up and create countervailing protections and indirections, keeping us safe from the transparent nature of at least some of our genes?

Into this gap steps cryptography. The first, easy-to-imagine measure is biometric hashing of DNA data. An arithmetic formula is applied to 750mb of DNA data, reducing it to a few dozen lines. If the formula is run in reverse, those few dozen lines could decode to one of a trillion trillion or so possible DNA sequences. This is just how hashes work in general — those long strings of “hex” (8511b2ee59142cf1ced7e70ff6fca103 for example) can be derived from any digital data source. If one bit changes in the input, roughly half of the output changes, so hashes are (generally speaking) very secure. Indeed, Bitcoin’s security rests on the security of hashes, and the computational difficulty of finding two texts which have similar (not even identical!) hashes.

Of course, biometrics are (generally speaking) a bit squishy and analogue — they are, after all, measurements of a body. Even DNA sequence matching is usually probabilistic, because even if the data is dry and digital in the abstract, in practice an awful lot of error-prone wet chemistry is performed to get to the matches and sequences.

Most of our current generation of biometric hashing schemes mash the data, so that when it goes into the hash, if there is an analogue measurement error on the body the impact of the error is reduced. For example, if a fingerprint line is obscured by a paper cut, the impact of the error compared to accurate measurement of the lines on your fingerprint is reduced by the algorithms so that the underlying facts (fingerprint A is the same person as fingerprint B) are not obscured.

In theory, a biometric hash of your DNA ought to be a pretty reliable artefact: your DNA sequence does not (for the most part) change very much, although within muscle groups, organs, the brain etc. the parts of the DNA which are active and inactive are changed by diet, exercise, heredity, stress, use, and many other external factors. Changes in the active and inactive sections of DNA do not alter the underlying code, so while these factors are medically interesting, they do not affect our gene sequences for biometric purposes.

The same kind of biometric hashing can be applied to fingerprints, iris scans and various other kinds of measurements of the body, with differing degrees of success, confidence and precision.

The problem is that to prepare a biometric hash, I require a complete biometric: I need to have a sample of your DNA, which I sequence, to check your DNA against a biometric hash in a database or in a block chain. In theory I might delete that data immediately, but in practice the temptation of people to filch biometric data (or even biometric samples) to pry deeply into people’s bodies and potentially minds will likely be too strong to be reliably resisted. It’s hard to imagine being in a world where somebody steals some cells you left on the arm of a chair in your doctor’s office, and the resulting data set doubles your life insurance premiums because of a family risk of some expensive disease or other, but this is exactly the kind of world that cheap DNA sequencing and interpretation of results takes us into.