This is a transcript of Philipp Jovanovic’s presentation from Master Workshop: Layer 1 Solutions, which took place at Primalbase AMS.

Good morning everyone. So I will present our project OmniLedger which is a secure scale out decentralised ledger, and we achieve that via sharding.

So this is a joint project between quite a few people and some of them should look familiar to you because they also presented here or gave workshops. So this OmniLedger was designed by Eleftherios, Nicolas, Linus, Ewa and Bryan and myself.

So, first I will present the motivation as to why we actually want to do sharding. Then I will introduce OmniLedger, which is our proposal. Finally, I will go over the our implementation and evaluation of the system, and finally conclude.

So, regarding motivation: I have a question for you. What are we using blockchains for? Or what are people using blockchains for? ICOs, yes! What else? Gambling. So that’s actually one of the big things right, people like to do games on the blockchain, and a while ago there was this game called Crypto Kitties which was deployed on Ethereum which completely stole the whole system because it became so popular that Ethereum could not take the load to operate that game.

That’s a little bit of a half joking example, but it shows very well what happens if blockchains become more popular and use becomes more widespread. We need really high performing ledgers to support all of these applications and whatever comes next. For that reason, we need to look at the core problems on the scalability issue for current blockchain systems.

At the core of Bitcoin there is the Nakamoto consensus protocol which works roughly like this — you have miners in the network, they are connected to each other and they mine blocks by solving these cryptographic puzzles, and here the yellow node is the latest miner that found the latest block and he extended the Bitcoin blockchain.

This mechanism has several drawbacks. First of all, there is a huge transaction confirmation delay - at least in Bitcoin - where the time that it takes to solve these cryptographic puzzles and thereby publish a new block is at least 10 minutes. That means that if you as a client are sending a transaction to the Bitcoin blockchain, it takes at least 10 minutes until it appears there and you can see it as confirmed.

On top of this ten minutes delay, you also have only a one megabyte block size which, when you do the math, translates to roughly four transactions per second that the system can sustain. And to give you an example on what that compares to, four transactions per second is roughly at the amount of throughput that an Ikea does on a busy Saturday. Another problem is that Bitcoin provides you only weak consistency. This means that once the transaction appears on the blockchain, you still have to wait several blocks before you can be really sure that it stays there. This is because in the last part of the blockchain, forks can still appear. And once the transactions are six blocks down in the blockchain, the probability that the fork still happens is reasonably low. And, of course, the last problem, which we won’t talk too much here, is that proof-of-work mining you need to solve these cryptographic puzzles consumes a lot of energy to sustain all of that.

Now you want to think, ‘Okay this sounds like a problem in cloud computing too’, and people just throw more hardware at the issue and then it scales better. But it’s not so easy here, because if you add more resources to the network in the form of miners, for example, that doesn’t change the fact that a block still takes ten minutes to be published and the block size is one megabyte. That doesn’t change anything on the latency or on the throughput in the long term, and also it doesn’t change anything about the consistency problems that I mentioned before.

What we really want from our next-generation scalable ledgers is something like that. The Green Line shows you Bitcoin, on the x-axis you have the number of validators, so if you have n validators, Bitcoin gives you four transactions per second. If you have two n validators Bitcoin gives you four transactions per second. If you have five n validators, the same thing, so it doesn’t change anything if you add more resources. But from an ideal system we want to leverage the additional resources that we have available in the network. The blue line, on the other hand, shows you what we actually want to achieve with more resources. We want to get a linear increase in the throughput, ideally. And this is a property which we call ‘scale-out.’ So the throughput increases linearly with the available resources.

When you look through the distributed databases literature, there is a technique called ‘sharding,’ and sharding allows you to achieve scale-out performance. So how would you take this from this classical distributed databases towards the blockchain world? You can imagine that you group the validators into distinct subsets, and you assign your validators to do different blockchains here — the yellow and the green one. And then the yellow validators would validate the yellow transactions and the green ones would validate the green ones, and thereby you now double the throughput in the system.

However, when you do that, you come up immediately with two questions. First of all, how do we assign these validators to the shards? The second one is, how do we send transactions across shards? We want to make it so anyone with accounts on the green blockchain can pay people on the yellow blockchain and vice-versa, otherwise the utility of that straw man system would be very low.

With those two questions in mind, let’s have a look on what people have worked on before. Here you see the three goals that we want to achieve in our sharded or in the next generation ledgers. So, we want to have decentralisation of course, we want to achieve security and we also want to achieve scale out performance. In 2016 at CCS, there was a paper called ‘A secure sharding protocol for open blockchains’ which introduced Elastico.

Elastico is here on the axis between scale-out and decentralisation, which means it sacrifices security. The problem with Elastico is that you use Proof of Work to assign validators to shards, but that’s not unbiased because miners can drop blocks and thereby bias basically the randomness. Also, when you look into Elastico a little bit closer, you see that the failure probability of the system increases with the number of shards that you have, which you don’t want of course. So that’s why it sacrifices security.

Another system is RSCoin. It was presented at NDSS and is called centrally banked cryptocurrencies. It’s also on the scale-out and security axis. So, RSCoin sacrifices decentralisation to achieve both scale-out and security - you have a central coordinator that takes care of the sharding and in RSCoin you also do not have a EFT consensus mechanism that secures the blockchain, so it might happen that a client who colludes with the coordinator can double spend in the system.

Then on the decentralisation and security axis is the system that Eleftherios presented yesterday called ByzCoin, and yeah I won’t say that much more here. ByzCoin has good performance but it doesn’t scale out as Eleftherios mentioned yesterday.

In this talk yeah I will present OmniLedger, which achieves all three of those properties.

When we started looking into OmniLedger, we had the following goals, which are in two categories. We had three security goals here in blue — we want to achieve full decentralisation, so no trusted third parties or senior points of compromise. We want to have short robustness, meaning that, from time to time, you need reconfiguration events in your system, but throughout those times we want to keep the shards operational. We don’t want the shard to go down and then you have to reboot strap those shards. And we also want to have secured transactions, especially across shards, meaning that transactions commit atomically or they abort eventually.

And then we also had three performance goals, namely scale-out. So we want to have linear throughput, as I mentioned initially. If you have very high transaction throughput systems, your blockchain will of course also grow very rapidly. This leads you to eventually having storage problems because somebody has to store the entire history, so we also want to get better there. And also we want to have low transaction confirmation latency. So when I send a transaction, it should be immediately or very quickly confirmed and we don’t want to wait for ten minutes or more until it’s sure that the transaction was committed to the system.

I will start with the strawman approach which we call ‘SimpleLedger.’ SimpleLedger evolves in epochs which are denoted here by E. In SimpleLedger, you have a shard coordinator which regularly issues a shard configuration and then the validators take that configuration, check in which shard they are, start setting up the shards with their peers, and then, once that’s done, they can start processing transactions. Here in following example, what you see on the top is the red shard coordinator and the validators, once they have the configuration, they get assigned to one of these three shards — yellow green or blue, and once the setup is done they start processing transactions.

So that of course gives you scale-out performance, but it has several problems. First of all, the shard coordinator is a trusted third party. It also has this problem that when the shards are reconfigured, the shards go down and you need some time to reboot strap those shards. There is, so far, also no cross shard transaction support, so basically what we just generated are three different blockchains.

We also have a few performance drawbacks, namely the ByzCoin failure mode, which means that the original ByzCoin system was using tree structures to get very high performance, but you have problems here when nodes which are high up in the tree go down, then you cannot communicate with the whole subtree, you need to reconfigure that subtree or even worse if the leader goes down, then you need to make a few changes which is also costly. Then, as I already mentioned, you have high storage and bootstrapping costs because ByzCoin is a very high throughput system so the blockchains there will grow very fast, and you also have a throughput versus latency trade-off, meaning that when you want to achieve good throughput, you need big blocks. But you process to communicate and processing these blocks will take time, so communicating just a 1 megabyte block is much quicker than communicating a 32 megabyte block, but 32 megabyte block gives you much better throughput than a 1 megabyte block. We also want to get rid of that trade-off somehow.

Here is a roadmap how we go from SimpleLedger to OmniLedger in 6 steps. First of all we will introduce randomised sharding. Then, we will tackle the problem of secure system configurations to have no downtime during these reconfiguration events. We also introduce an atomic cross chart transaction protocol. Then you enhance the failure resistance of ByzCoin by having more robust communication patterns and we do something which is called ‘blockchain pruning’ to reduce the size of the amount of storage that you need. Finally, we introduce a mechanism called ‘trust but verify’ to tackle that trade-off between high throughput and low latency transaction validation. So, I will only talk about these three points — they are the most important ones and also because of time.

So if you remember in the very beginning, I said there is this open question of how do we assign validators to shards? We have basically two options. The first one is we do so deterministically. We say the first five notes go to the first shard, the next five nodes go to the second, and so on and so forth. However, that’s a problem because the adversary can totally predict that and can potentially position their nodes in such a way that they can corrupt an entire shard. As Mustafa said in his talk that’s a very bad situation, because then these nodes can corrupt the entire system through bad cross shard transactions. What’s a lot better is we do so randomly, because then when we use unbiased little public randomness, the adversary cannot control or predict that assignment.

The next question becomes ‘how do you ensure the long term chart security against an adaptive adversary?’ Once you’ve assigned your nodes, the adversary might have some time to corrupt nodes adaptively afterwards. The two solutions are basically that you make the shards large enough and you also periodically reassign the validators to shards, which are these reconfiguration events that I mentioned before. And when you do the math, then what you see is depicted here on the graph. On the x-axis, you see the adversarial power, and on the y-axis you see the approximate shard size that you need, and for example if you want to be resistant against the 25% adversary you roughly need a shard size of 600 to be secure.

As I said, the challenge here is how do you do unbiased several unpredictable and scalable shard validator assignment and our solution is we use public randomness. And here we combine a VRF based lottery with an unbiased randomness protocol, namely RandHound that Ewa presented in the previous talk. If you remember, RandHound needs a leader or a client basically to execute the protocol. He’s there for coordination. But again, we cannot pick the leader deterministically because then the protocol might fail a bunch of times, so that’s why we use a VRF-based lottery to pick from all the validators that are out there, one person who becomes the leader.

This leader executes RandHound. Why are we doing this? Now the leader might have been picked randomly, but this might still be a bad guy so we cannot trust him fully to do the shard validator assignment. So we want to take the input of all the validators that are available in the system and that’s where RandHound comes in, which does exactly that — see the last talk. And once we have the verifiable randomness, we can use it to assign the validators to the shards here. Again, we have a yellow blue and a green shard.

The next challenge was that we wanted to have a protocol for atomic cross-shard transactions. To do that, we again look back into the literature on distributed databases, and there are systems or protocols which are called two-phase commits where you have a coordinator and a bunch of service. It says there are two phases in the protocol. First, you have a voting phase where the coordinator sends a query to all the servers, asking them can you execute that transactions and all of the servers reply yes or no. The coordinator collects the replies and checks if everybody voted yes and he can tell them ‘okay now please commit’, and if there is one of them who says no, then you execute a rollback protocol.

That looks already promising, however if you are in a Byzantine setting then you have the problem that the servers can be malicious. So one of the servers can always vote no and basically you don’t have any liveness anymore in the system.

Taking that idea further, what you then realise is that once you have this shard assignment, this random shard assignment, and you have a Byzantine filter and consensus mechanism within the shards, you can consider basically all the shards honest. Now you take it one abstraction level higher and consider your shards as the honest processes in the system. So, to give an example here, which is basically for the Atomics protocol which does these secure cross-shard transactions, you have an initialisation phase where a client says ‘I want to execute this transaction’. Here in the example it has inputs in shards 1 and 2 and it pays somebody in shard 3 (the yellow one) and then the client sends this to the two input shards, the two input shards lock the funds and sends acknowledgments back to the client, the client collects all of these acknowledgments and then, if all the input shards could lock the funds, he can go to the output shard and say ‘Here I have a proof that you can commit the transaction in the yellow shard.’

On the other hand, if one of the shards aborts this process, here in the example, the green one sends back an error — maybe the client does not have enough funds in in his account in the green shard or something. The client will execute a rollback mechanism where they go then to their input shard that already has locked the funds, and says ‘Please unlock my funds’ and thereby they can reclaim the transaction inputs.

The last point that I want to mention is how you can deal with the throughput versus latency trade-off, where we introduce this trust-but-verify transaction validation mechanism.

As I mentioned initially, if you have big blocks you get high throughput but bad latency confirmation because it takes some time to propagate the blocks and to process them. On the other hand, if you have small blocks you have low throughput but a high transaction confirmation latency because small blocks propagate fast and they can be processed very fast. So now we ask ourselves, ‘okay how how can we manage to combine these two approaches?’ What we did is we introduced this mechanism where, within a shard, you have one big group of validators who are there for the verification of large blocks and for adding them to the blockchain. You also have a set of smaller sub shards here — the green ones which are the optimistic validators. A client sends his transaction always to the optimistic validators, they check it against their current state and give them very quickly a feedback just on that transaction. They forward those transactions to the biggest up shard, which re-executes the transactions and checks if it’s valid according to the blockchain of the shard. Thereby, you can see the client gets a very quick feedback on his transactions and the core shard can aggregate batch those transactions together into very big blocks and thereby he gets good throughput. So this is perfectly fine, because if you buy just a coffee with your cryptocurrency then it’s okay that you get a very quick confirmation because the chances that somebody is attacking you for, I don’t know, €3 euros or so. On the other hand, when you have a big spend, you might want to wait until the core validators have really revalidated everything and put the transactions into the blockchain.

Coming to the evaluation, this might look familiar because it’s exactly the same one that Ewa showed before. So we basically implemented all of that using our Cothority framework, the ByzCoin consensus mechanism. We also implemented the Atomics cross shard transaction protocol. This is all based on the Khyber crypto library, the Onet network library and the Cothority framework and the digital app setup uses 48 physical machines and they were very well equipped machines so to get realistic network configurations we artificially reduced the bandwidth and also put a 200 millisecond router in time latency, which roughly models the round-trip latency that you have on the internet.

So, the first thing we wanted to evaluate is ‘okay does OmniLedger really achieve scale-out performance?’ In the graph above, on the x-axis you see the number of validators and shards, and on the y-axis you see the the number of transactions per second that OmniLedger can execute. And as a baseline you also see Bitcoin. Here on the low end, it does like constantly just 4 transactions per second versus, when you look at OmniLedger with 70 validators and one shard you get roughly 440 transactions per second. If you double that amount to a hundred and forty validators and two shards then you get 870 transactions per second and so on and so forth, and when you look at the line it’s really gives you a linear throughput increase in the number of validators that you have.

Then on the other hand we also wanted to look into the question, what is the maximum throughput that OmniLedger can give you?

Here again, on the x-axis you see the shard size and the adversarial power and on the y-axis you see the number of transactions per second that OmniLedger can do. So if you have a very, very weak adversary, which for example only corrupts 1% of the nodes then you can live with shards who are just of the size of 4 nodes, thereby you get really, really high throughput. If you go to a more permissionless setting, on the other hand - for example, if you have a 25% adversary — you need a number of validators in a shard of around six hundred, and there you can see you get roughly two thousand transactions per second and something in between. When you have a 12.5% adversary, you actually get more throughput than visa is doing on on average.

And then the final question that we had — how does OmniLedger do with this latency and throughput trade-off? So, here you see in the first row OmniLedger in a regular setting without this trust-but-verify mechanism, where you get on the very right hand side with 600 validators in a 25% adversary and 1 megabyte blocks. You get a confirmation from the client within roughly 15 seconds. When you split that up and you do this trust-but-verify validation, you can check the next two rows and here it improves already quite a bit. So, to get a first confirmation, a client only has to wait for 0.5 seconds and then waits for the consistency to the re-evaluation of the core validators - they needs to roughly wait one minute. But then you have also 16 megabyte blocks in your blockchain, which gives you very high throughput. And when you compare this to Bitcoin, this is quite similar. You have like first confirmation, this is like as soon as the transaction is included in the blockchain, which takes roughly 600 seconds or about 10 minutes, and for consistency you have to wait 6 blocks, which translates to 1 hour or 3600 seconds for one megabyte blocks.

I presented OmniLedger, which is a secure scale-out distributed ledger framework. We use sharding for unbiased civil verifiable randomness generation to assign validators to shards and thereby achieve linear scale throughput. We use this Atomics protocol to have secure cross shard transactions, and we use a trust-but-verify mechanism for breaking this trade-off between low latency confirmation, high throughput and OmniLedger. I didn’t mention that so far, but it works for proof of work, proof of stake or in a permission setting - whatever you prefer. The paper appeared at I Triple E security and Privacy earlier this year.

Q&A

I wanted to ask does the number of shards need to be preset when the system starts? How do you move from one to two to more?

No, you do this adaptively because the system evolves in Epoch, right. So what you assume is that you have an identity blockchain running where the nodes sign up their identities saying ‘Hey I want to be a validator’ and if in one Epoch there are a lot of nodes joining then you can in the next Epoch when you do the reconfiguration just assign them to new shards or even spin up an entire new shard as you go.

Just out of curiosity, do you need to choose if you’re doing the trust-but-verify thing in the configuration or can any node do?

Either way, so you could imagine that this is a system parameter when you set up the system you just say we always do trust-but-verify. From a client perspective it doesn’t change much right because if the system has trust-but-verify and you don’t want to just have the first confirmation then you just wait for the second step and then you just ignore it basically.

Do you have some information on the throughput for cross-shard transactions?

No we did not do that experiment, that’s that’s a good remark. So, of course if all of your transactions are cross shard then your throughput will go down but for that you would need some kind of realistic data points or you would need to come up with some simulation to do that and we didn’t do that experiment.

And another quick one and where the 25% come from? Is it a bound or why you do pick 25?

So this 25% comes from the fact that failure probability, okay I didn’t mention that, so the failure probability from that with 600 validators is relatively good. So we fixed a target failure probability in our system that we wanted to achieve, and when you then do the math for the different adversarial powers then I have 1%, 12.5%, 25% and so on and you just check what is the shard size so yeah it was just from the mathematical analysis so there was not much behind that.