Metadata

Think about it like this: if I send a letter by post to my friend, I will put their full name and address on the front, and my full name and address on the back so that I can receive a reply. Although no one can see inside my letter and read its contents, the postal service now has a good idea of who I am, who I sent my letter to, and what time I sent my letter. This information is typically referred to as metadata.

Here’s a great quote from Kurt Opsahl from the EFF, writing on the topic of government metadata collection:

“They know you rang a phone sex service at 2:24 am and spoke for 18 minutes. But they don’t know what you talked about. They know you called the suicide prevention hotline from the Golden Gate Bridge. But the topic of the call remains a secret. They know you spoke with a HIV testing service, then your doctor, then your health insurance company in the same hour. But they don’t know what was discussed.” [1]

And Michael Hayden (former Director of the NSA)

“We kill people based on metadata” [2]

Now how does this relate to private messaging?

Centralised services

When you use a service like Whatsapp, Telegram or Signal, you connect directly to a server owned by one of these companies. When you send a message, you must somehow tell that centralised server who you are and who the recipient of your message is. You will also typically give up your IP address and your phone number, depending on the client you use.

These centralised services have the capacity to collect metadata. Some providers, like Whatsapp and Facebook Messenger, openly admit to collecting and storing their user’s metadata. [3][4] Some services, like Signal and Telegram, have policies against storage of user metadata. [5][6] These policies require a significant amount of trust, however it has been shown countless times that as soon as trust is involved in user privacy, companies and governments will violate that trust. Look no further than popular ‘private’ email service, Hushmail, which was compelled legally to hand over users’ private information to governments. [7] The only solution to this problem is for these centralised, privacy centric services to never have the ability to collect sensitive user data.

(A note on XMPP)

Messengers based on XMPP are slightly different in their construction since they operate on a federated model. [8] This means that only a protocol is provided, users/providers can operate their own servers (which means that metadata is held by individual servers), and users have more choice in the parties they trust to store their data. This is a step in the right direction, though XMPP servers can still collect this data and have the ability to transmit it to third parties.

Decentralised Services

So how do the decentralised models deal with metadata? Well, there’s a few out there and they each approach metadata in a different way.

Services like Tox, and other P2P messengers, by their very nature do not require metadata to be stored on any centralised server, since messages are routed directly to their participants rather than traveling through servers or hopping through a mixing network. On the surface this can sound great, but really it gives more power to ISPs. Although the packets of data may be encrypted, ISPs can see the IP address of the person you are messaging, and if they can resolve the IP address of the recipient then they are able to very accurately ascertain information on who you messaged and at what time. This is something centralised services (Telegram, Signal, and WhatsApp) prevent. If encrypted properly, your ISP only knows you are communicating with a messaging service server, and never knows the actual IP address of your recipient.

Bitmessage is one of the other fully decentralised messaging models and might be the next best for hiding message metadata. Bitmessage operates a floodfill network, meaning that messages aren’t directed towards specific participants like they are in most other networks. In Bitmessage, every user is distributed a copy of every encrypted message. Once a user receives a message, they must attempt to decrypt the payload. This means when I want to send a message, I “flood” the message to the peers I know, who then flood the message to the peers they know. This technique makes it difficult to ascertain whether a user is merely relaying a message, or if they are the source of the message. It is also imposible to ascertain whom the true recipient is, as all clients receive all messages and attempt to decrypt them. However, there are significant downsides to operating floodfill networks, such as the high bandwidth consumption and computational overhead of attempting to decrypt each message you receive.

Loki Messenger

Loki Messenger uses a number of different techniques for obfuscating metadata. In the context of the above, it is close to a mashup of Bitmessage and the Tor network. Like Bitmessage, Loki Messenger uses a decentralised set of nodes (Service Nodes) to relay messages to other users. Service Nodes are incentivised by the Loki blockchain and undertake the job of relaying messages to receive a small reward in cryptocurrency.

To avoid metadata collection in both offline and online communications, a method called onion routing is employed. Onion routing in Loki positions a set of randomly selected (simplified) Service Nodes between the user and the destination of their message. As the message hops through each Service Node toward the final recipient, a layer of encryption is removed. This onion routing provides a number of protections:

In a typical connection it is not possible for a single Service Node to gather enough metadata to form a connection between the sender and receiver of a message. Service Nodes are run by the community, and are globally distributed, which makes targeted legal action difficult. Service Nodes require a large amount of the Loki cryptocurrency, meaning they are staked into the system. Any actions which reduce the usefulness of the Loki Messenger will likely also reduce the value of the Loki cryptocurrency. This financially incentivises good behaviour.

When a direct link between users cannot be established (the recipient is offline), Loki can also enable metadata protection for offline messages. This is done through a process called Swarm messaging. Each Loki Messenger user resides in a swarm of 10 Service Nodes. To contact another user, I can use their public key to figure out their swarm identity. My message is then sent to one of the ten nodes in their swarm, which will flood the message to the 9 others. When the user comes online, they will randomly query a node in their swarm. If that Service Node is holding a message for them, the user will download it directly from the Service Node.

Alice sends a message to Bob, Bob’s assigned Swarm is B, When Bob comes online, he queries a random node in his swarm and receives Alice’s message.