Planning a distributed app (dApp) is exciting, but can be daunting 😬

This is the first of a few posts that explain the Holochain infrastructure. We’re assuming that you have a vision or project in mind already for context. You don’t need to be super technical, but do bring an open mind 😁

Let’s get you from “Can we do this?” to “We can do this!”

Cryptography Basics

Advanced cryptography is crazy advanced.

The basics are quite approachable though. This is lucky because you need the basics to plan a dApp.

Most cryptographic concepts are very old. Much older than any computer system. Many are as old as society.

Cryptography uses math to answer 3 questions:

Do I have the right data?

Who created this data?

Who can access this data?

In a centralised system we trust the service provider to answer these questions for us. When Twitter shows president Trump tweeting something, we accept that he wrote that tweet and that everyone sees the same thing.

In a distributed system there is no service provider, so we have to rely more on cryptography. No real system is pure trust (Twitter uses encryption) or pure math (there is a person somewhere).

Getting the Right Data (hash)

We keep records so that we have a reliably copy of information to use later. It’s easy to forget or for people to lie about the past. It’s important that our recordings remain believable over time.9

One ancient technique is to tear a document into two or more pieces. The tear forms a natural tamper-proof barrier across any information it crosses. If anyone rewrites the torn information then the halves won’t match.

Split tally sticks provide simple and effective data security.

Merchants used split tally sticks to track debts. Notches representing the debt are torn down the centre. Neither the lender nor borrower can change a torn debt later. Anyone can verify the debt (or manipulation) by bringing the two halves back together.

We can’t tear digital data, but we can achieve something similar with a hash. A hash takes any data and turns it into a consistent format that looks random. If any part of the data changes, it has a completely different hash.

Here are two hashes that 90’s hackers might be familiar with 😉

password => 5f4dcc3b5aa765d61d8327deb882cf99

passw0rd => bed128365216c019988915ed3add75fb

Hashes have some useful features:

one way: it is impossible to get the original data from a hash

reliable: the same data always gives the same hash

unique: different data always gives a completely different hash

If we have a well known hash for some data, then we can verify the data ourselves later. This is useful because a hash is tiny and portable, but the data might be huge and immobile. For example, we could hash an entire season of Game of Thrones into 32 characters. We can’t watch the 32 characters, but we can use it to verify the season when we download it later.

A neat side effect of the “different data = different hash” rule is that we can use hashes to index data. In a database containing both the data and their hashes, we can lookup the data associated with a hash. This is not using the hash itself to read the data, it’s more like a table of contents.

Identifying an Author (signature)

A hash (or split tally) tells us that we have the correct data. It doesn’t tell us anything about who created the data.

Sometimes we need to know both who is the author of a message and that we have the original message.

Imagine a medieval king co-ordinating a war. It is important that anyone can verify which war documents are authentic. If it is easy to create fakes, or change real documents, the king has problems.

A wax seal gives us confidence in the author of a message.

The traditional solution is a wax or ink seal.

Seals meet both our basic needs:

The stamp of the seal is recognisable and unique to the sender

Any damage to the wax/ink is immediately obvious.

Even today forging a seal is a serious crime.

The digital version of a seal is a digital signature. A digital signature looks very much like a hash but comes with a pair of keys. Like a hash, we cannot retrieve the original data from the signature itself. Also like a hash, every signature is both unique and reliable so nobody can forge signed data. Apologies for the mixed metaphors, but academics are not great at naming things…

For each signature there is a public key and a private key. The private key works much like a stamp used to make a seal. If you have a copy of the private key, you can impersonate the owner. The public key lets anyone verify a digital signature made by one specific private key, but that is all.

The keys represent a clear tradeoff. There are a lot of tradeoffs in cryptography. We gain the ability to identify the author of any content with total confidence. We also now have a private key to keep secure 24/7, which may be easy or nearly impossible, depending on the context. In a centralised system there is only one key to secure, but losing it compromises all data for every user. In a distributed system each key must be secured separately, but each stolen key represents only one user compromised.

Access Control (encryption)

Sometimes it is not enough to only verify data, we also need to restrict access.

Vaults are either open or closed, always complete access or restriction.

There are two physical solutions to this:

A safe or vault to store items A sealed envelope containing a sent message

The vault has a single key that opens it. An open vault has no restrictions on storing or retrieving items in it. The closed vault is only opened with the key.

A sealed envelope has a seal to close it, hiding the message. The recipient only opens and trusts the message if they recognise the intact seal. This is a great example of combining a few basic techniques (a seal and a mini vault) to create a secure system.

Encryption provides a digital analogy for both of these. Symmetric encryption has one key and works like a vault for storage. Asymmetric encryption has two keys, one creates messages and the other reads them.

There is a major difference between encryption and a hash or signature. Encryption is a two way process, so we can decrypt an encrypted message with the correct key.

The reversibility of encryption adds new tradeoffs. In addition to key management:

encrypted messages are not small, they are at least as large as the original data

if someone breaks the encryption tomorrow then we lose control over today’s messages

Cypher runes were advanced technology at one point. Now they are quaint and impractical.

The Scandinavian rune stones used cutting edge encryption for their time. Yet these encrypted messages use a lot of stone, and modern techniques have “hacked” them all.

Quantum computers may someday undermine today’s popular encryption techniques.

Next Steps: dApps & Private Data

That’s covers the basics of cryptographic systems for dApps!

Now you know how data integrity, ownership and security can be enforced without a trusted service provider. These basic building blocks are super important to any Holochain dApp design. They are also quite low level, the conceptual “atoms” of your dApp. In future articles we will start to zoom out and show how these can be combined to achieve real world functionality.

The next article in the series will cover various privacy models in dApps. You will need all the concepts discussed here, so great job getting this far 🎉