vnTinyRAM

In a series of papers Ben-Sasson, Chiesa, Genkin, Tromer and Virza (BCGTV) introduce a small virtual CPU called “vnTinyRAM” and show how to convert programs that target it into circuits. The paper Succinct Non-Interactive Zero Knowledge for a von Neumann Architecture introduces a von Neumann machine, which is a computer where code and data exist in the same memory and can be addressed in the same way. If you want to read that paper please note that a mistake in it was found and fixed, which demonstrates why it will take some time until the underlying maths are widely studied and trusted enough to be used in production. In 2014 the team extended their techniques in the paper Scalable zero knowledge via cycles of elliptic curves.

vnTinyRAM is far simpler than a real computer. Because it’s intended for checking properties of data structures it of course cannot interact with real hardware, and expects to start execution with an empty memory. It has two input tapes that model streams of data that can be read exactly once. The first tape, the primary tape, contains the program and data. The second is called the ‘auxiliary tape’ and contains ‘advice’ used to speed up the calculations. We’ll get to that later. It also has a little instruction set capable of doing basic computations, loads, stores, jumps and so on:

Because it’s virtual, vnTinyRAM is easily reconfigured to use different register sizes: it can be 16 or 32 bit depending on your needs. Adding a new instruction set like this one to ordinary compilers is quite easy.

We start by defining what the CPU state actually is: the current program counter, the instruction about to be executed and the registers. An execution trace is a sequence of CPU states, one after the other. We say such a trace is valid if each step in the sequence follows the rules of the CPU itself: for instance, registers don’t randomly change their values when no opcode that does so was executed.

It’s easy to create such a trace by emulating this tiny CPU in software on a real computer and recording each step. The trick then becomes to write a circuit that can check such traces are valid, and use that circuit to construct a zero knowledge proof that we know a valid trace for the combination of the program and our private data (the thing we’re trying to prove stuff about). So we’re going to compose our proofs out of more proofs. A proof of trace validity can be broken down into three parts:

A proof that the fetched instruction was executed properly.

A proof that the right instruction was fetched from memory at the given time step.

A proof that each load from memory retrieves the last value stored there.

We can imagine a circuit that checks the first without much trouble — instructions like ADD or MULL convert directly to gates. More complicated instructions like UDIV (division) don’t seem to translate directly: we only have addition and multiplication gates! But we can cleverly sidestep that problem by requiring the actual answer to be provided on the auxiliary advice tape mentioned earlier, and then multiplying back through before accepting the answer. This trick of using verified “advice” shrinks the size of the circuits needed and speeds things up a lot, as well as simplifying the code. The contents of this “advice” are controlled by the prover who may, of course, be trying to trick us into believing something false. So the CPU can’t assume much about what’s there, but if it’s less work to check an answer than compute it, we win.

Implementing memory

Mathematical equations can’t remember things: they always yield the same answer given the same input variables. That poses a problem for simulating a memory bank with them.

To solve this, we extend the data put into the CPU states recorded in the execution trace. Let’s redefine a CPU state to mean:

A timestamp measured in CPU cycles (steps)

The root of a Merkle tree over the contents of external memory

The current program counter and set of registers

A flag denoting whether the program has accepted the input data (i.e. decided the thing you’re trying to prove is true)

A Merkle tree (image credit: Wikipedia)

Because the root of a Merkle tree is just one hash, the CPU state can be a small and constant sized piece of data no matter how much RAM is used. When the CPU wants a piece of data loaded from memory this is provided as more advice on the advice tape, along with an authentication path (Merkle branch) that shows that this value was indeed stored correctly in memory. A Merkle branch is the set of sibling hashes needed to work up from a leaf of the tree all the way to the root, calculating the nodes along the way. It is effectively a small, special-purpose proof that the given item was in the tree when it was calculated.

When the CPU wishes to store data, it can do a load-then-store to request the authentication path of the old value and then use it to calculate the new root given the updated location. Thus a link between each step in the execution trace is established.

Some readers will already be familiar with Merkle trees. The version found in the Scalable Zero Knowledge paper differs from “normal” trees in that they don’t use a standard hashing algorithm like SHA-256. Instead they use a custom collision resistant hash function based on the subset-sum problem, as it is much more efficiently implemented in arithmetic circuits of the type they use.

Cycling the CPU

The awake reader will have noticed that we don’t appear to have got anywhere: we started down this path because you can’t do loops in circuits, and yet a CPU repeatedly loops as it runs the program. Stuck?! Not quite!

Earlier versions of these techniques had a simple solution— like in the Pinnochio C example, just unroll the CPU! In this version of the algorithm when generating the universal CPU circuit you had to pick the register size and a maximum execution time. The generated circuit would then execute programs up to that limit. The circuit that verified each step in the CPU trace would be duplicated over and over, with the outputs of one step wired to the inputs of the next. This led to enormous computational costs. Additionally, earlier works didn’t use the Merkle tree approach for authenticating memory: instead they used routing problems over Beneš networks which was much more complex.

The Scalable Zero Knowledge paper fixed this problem by using every computer scientist’s favourite trick: recursion.

The generated vnTinyRAM circuit implements exactly one cycle of the CPU. It takes as input a previous CPU state, along with a proof that the prior state was valid. It also takes the supposed next state. Because the circuit checks the prior proof and that the transition is valid, feeding the circuit through the SNARK algorithms spits out an updated proof that can then be fed back into the universal circuit again to run the next clock cycle.

You keep doing this, feeding proofs back into the same circuit again to prove the next step, until the program you’re running eventually answers YES (if it wouldn’t answer YES then doing all this is pointless, you’re just burning CPU time). As the exact point at which the program accepts might be sensitive, for privacy reasons you can keep iterating the CPU beyond that time, it just won’t change the answer.

This is a form of proof carrying data and has a lot of interesting uses outside the immediate problem of proving the execution trace of a CPU.

Proofs checking proofs checking proofs

The trick above relies on something that turns out to be very awkward: we have to be able to verify a zero knowledge proof using an arithmetic circuit.

For complicated reasons I won’t get into here, you can’t just encode the verification calculations directly as a bunch of gates. That would be ideal, but as Vitalik briefly mentioned at the end of his tutorial all the above computations aren’t done using ordinary numbers of the kind we learned at school (hah, that would be too easy). They take place in the context of elliptic curves. To verify the values inside a proof in the obvious way, you would end up needing an elliptic curve that is mathematically impossible to construct. Dead end.

You could also use the split gate trick to convert all the values of the proof into binary and then do it like a regular CPU would, but that’d generate impossibly huge numbers of gates and make the entire thing impractical (we are pushing against the boundaries of practicality already). Also a dead end.

So we need one more trick, and it’s a clever one. We search for a pair of elliptic curves. The proof of the first CPU cycle takes place using the first elliptic curve, and is checked by another similar circuit that uses the second. The second cycle then outputs a proof using the first elliptic curve again. In this way the curve used alternates back and forth. This avoids the impossibility problem and also avoids the massive blowup of being indirect.

If you’re into cryptocurrencies you might know of the secp256k1 elliptic curve. Well, there are lots of different curves. The family of curves we need for this alternation trick are called Miyaji/Nakabayashi/Takano or MNT curves for short. Unfortunately, elliptic curves don’t just lie around waiting to be discovered. They must be generated by an expensive set of calculations. The BCGTV team did this — at the minor cost of 25,416 core days— and found a pair of curves that has the properties they need. The curve parameters are provided in their paper so nobody ever has to do this search again.

Performance and caveats

The algorithms outlined above have an interesting property: they require some truly vast calculations to be done, but they only need to be done once. After the CPU circuit is calculated it is a universal circuit that can be fed any program that uses the vnTinyRAM instruction set. Because iterating the CPU takes a constant amount of time for each cycle, we can measure how fast this ‘proving CPU’ runs.

Ready for it?

On a quad-core computer, each step is gonna cost you about 9 seconds. This gives a clock rate of 0.1 Hz. Suffice it to say, you won’t be proving an execution of Crysis any time soon.

Clearly, there’s a long way to go. But as prior techniques didn’t even scale linearly at all, this still represents tremendous progress.

There is one other big problem remaining. I mentioned that the calculation of the universal circuit has to be done once. That’s not quite true. The problem lies in the bit we’ve handwaved over so far. The task of converting a QAP solution to a quickly checkable zkSNARK involves the use of a “proving key” and “verification key”, which are derived from some random data selected by whoever computes the circuit. Unfortunately, knowledge of that random data is sufficient to forge proofs. This means the setup process is a weakness: how do you know that the random data used to initialise the algorithms was really destroyed? No matter how much theatre you throw at it, in the end you can’t really know for sure.

In 2016 research has shifted towards a new kind of SNARK that sits in the so-called “PCP world” and doesn’t have this problem. PCPs are probabilistically checkable proofs and are a topic for another time.

Conclusion

I hope that this article builds on Vitalik’s introduction to the underlying maths in a useful way. We need more explanations to complete the story:

The translation of a satisfied QAP to an actual zero knowledge proof.

PCPs and how they can be used in different ways to the algorithms libsnark uses.

uses. Pairing based cryptography and how pairings are used.

These are topics for another day.

Thanks to Eran Tromer and Eli Ben-Sasson for proofreading this article.