What is Substrate?

If you've followed any of Polkadot's development, you will probably have seen "Substrate" mentioned many times. It's an important component of the Polkadot project but information on it is very thin on the ground. It's not in the whitepaper or the yellow paper - or at least, not under the name "Substrate" - and the specification for it is still in heavy flux. At a high level, it's a framework for creating cryptocurrencies and other decentralised systems using the latest research in blockchain technology. That's not very helpful, though. At least, it's not very helpful for me.

I think the most important part of understanding Parity Substrate is that it is not part of Polkadot at all. Although Polkadot is built with Substrate and projects built with Substrate can run natively on Polkadot, you can use Substrate to build new blockchains right now. You don't need to wait for Polkadot to be finished or even for a proof-of-concept to be released to start working on a blockchain using this framework.

So what is Substrate? You can think of it as being like Express or another web application framework, but for building distributed or decentralised systems such as cryptocurrencies or a message bus. Just as most web applications shouldn't need to reimplement their own version of HTTP, we believe that it's wasted effort for every team creating a new blockchain to have to implement all the networking and consensus code from scratch. Not to mention the cryptographers, security researchers, networking engineers, devops personnel (to coordinate updates) and so on that would need to be hired and paid for when really your business logic is your product. If you want to build a new project using Substrate, all you have to do is implement a very small number of hooks in your code and then for free you get:

Consensus, finality and block voting logic. Even if you're not building a cryptocurrency or even a project that requires a blockchain this is desirable - it means that for free you can get byzantine fault tolerance, so your system will still continue to work correctly even if some of the nodes in it are broken, disabled or malicious;

Networking, so peer discovery, replication et cetera;

An efficient, deterministic, sandboxed WebAssembly runtime, this can be used to run smart contracts or even to run other Substrate-based projects. You don't need to use WebAssembly, of course, you can just write your own virtual machine interpreter, but we strongly believe in the benefits of using a WebAssembly runtime and doing so allows you to tap into our work on WebAssembly and the worldwide community of other developers creating tooling for it;

The ability to seamlessly run a node in the browser that can communicate with any desktop or cloud node;

A cross-platform database/file storage abstraction, which even works in-browser;

Seamless client updates - any update that could affect consensus is handled by compiling the code to WebAssembly and deploying it as just another message on the network. Not only that, but you can store as many versions of the consensus code as you want compiled to native code, and Substrate will handle the complexity of making sure that the native code being executed lines up with the currently-deployed WebAssembly code. You get the speed of native code, but because there's always a WebAssembly fallback you can deploy a native version of the code at your own pace and be safe in the knowledge that you can never accidentally get a hard fork or other consensus issues.

The ability to immediately start running your project on Polkadot the moment it is released. Although projects built with Substrate can be compiled to use separate clients per-project (like existing blockchains do), since Polkadot implements the Substrate API you can tap into the shared security and interoperability that Polkadot provides. Polkadot is itself being built using Substrate, which allows us to get fast feedback on any holes in the framework and allows us to run a Polkadot testnet or even a second instance of Polkadot itself as a parachain. If you don't know about Polkadot or if you just haven't been sufficiently propagandised about its benefits, you can check out this post on the Polkadot blog.

So what don't you get for free? Essentially it's just your state machine, which includes things like transactions. To make Substrate as generic as possible, it has no transactions. Instead, it has what we call "extrinsics", which are just binary blobs that you can use to store any data that you want. For most chains these extrinsics will include transactions, but of course you don't need to implement it that way! You could remove the concept of currency from the network entirely and use Substrate to create a decentralised Erlang-style actor-model concurrent system with a set of trusted authorities to verify the correct behaviour of the network. Assuming you do want currency and transactions, however, implementing the transaction format will likely be trivial - just an interchange format and a library to access that data from your chosen language. It's even easier than other distributed architectures like microservices - since the code and the data it operates on is stored in the same place, you don't need to enforce backwards-compatibility guarantees for transactions , just for storage. For chains with private transactions the implementation may be more complex. The names of everything are not finalised and so you'll see different language used in different places, but here's a simple explanation of what you'd need to implement in order to get a full blockchain up and running:

A function that creates a new pending block based on the previous block's header. The header includes: The block height; A "cryptographic commitment" to the block's state, this is important for light clients to validate that the block is correct. A cryptographic commitment serves the same role as a hash, you cannot change the state without invalidating the commitment; A cryptographic commitment to all the extrinsics in the body, which prevents the extrinsics from being changed; A hash of the block's parent; Some extra arbitrary data. One usecase for this data would be for client updates - since light clients only sync headers, if you want to update them you can't have updates implemented as an extrinsic or the light clients won't receive them.

A function that adds an extrinsic (such as a transaction) to a pending block. This should also update the chain's state (for example, account balances);

A function that takes a pending block and generates a finished block from it. This finished block can then be propagated throughout the network;

A function that executes an existing block. This is run by full nodes to confirm that received blocks are valid before accepting them. For example, in a value-bearing chain you could check that no-one tries to transfer more than their balance.

One downside of this design is that you have to manually make sure that the state transitions done while creating a block and the state transitions done while executing an existing block are kept in sync. If you don't do this, you could get consensus issues! This may change in the future, but for now this shouldn't be much of a problem in practice as you will likely delegate the executing of extrinsics to a common function.

Additionally, you need to provide a validator set. This covers both proof-of-authority and proof-of-stake/delegated proof-of-stake chains, although we have no intention to support proof-of-work chains in Substrate as of now. The validator set is a list of public keys whose corresponding private keys should be considered valid to sign a given block. The set can change, but each block is validated by the set that was chosen at the time of the block's creation. You don't have to handle the difficult problem of handling the validators' votes or even their "vouching" for individual blocks, that's handled by Substrate automatically. The validator set can be as large as you like, but there's a tradeoff to be made here. The less validators you have the easier it would be for them to collude, but the more validators that you have the more validators will be needed to validate any given block before it is considered "finalised" (i.e. unrevertable) .

We can't have Substrate automatically handle proof-of-stake for you, since proof-of-stake relies on your project including value-bearing tokens and not all projects will. Testnets may deliberately have tokens without value, and projects using Substrate to implement a message bus may not have tokens at all. However, it would be easy to write a library on top of Substrate that enforces the use of tokens and gives you transactions and proof-of-stake consensus automatically . One thing about Substrate is that it's relatively easy to build higher-level libraries on top of it. Even though you get a lot for free when building a new blockchain with Substrate, it's still a relatively minimal set of primitives and it's not really intended to be used directly. Instead, it should be taken as a building block and other common functionality can be factored into helper libraries. Although details haven't been confirmed yet, Polkadot is not the only chain pegged to be built on Substrate; as the platform matures, more libraries can be built to make building new chains as easy as writing a modern web app.

I know that "coming soon"s in tech articles are about as trustworthy as a politician's promise, but I'm going to end with one anyway. Although building on Substrate is already possible, we're currently missing learning materials. Right now, there's really no way that you could learn how to do any of what I just told you without already being part of the Polkadot team. We're working on that, though, so if any of this excites you then keep your eye out for Substrate tutorials and documentation coming soon.

Further resources:

Video: Gavin Wood presenting Substrate at Event Horizon 2018

Video: Rob Habermeier presenting Substrate at Truebit's Berlin meetup

Parity Substrate repository on GitHub