Bitcoin and Ethereum provide a decentralized means of handling money, contracts, and ownership tokens. From a technical perspective, they have a lot of moving parts and provide a good way to demo a programming language.

This article will develop a simple blockchain-like data structure, to demonstrate these in Haskell:

Writing a binary serializer and deserializer

Using cryptographic primitives to calculate hashes

Automatically adjusting the difficulty of a miner in response to computation time.

We’ll name it Haskoin. Note that it won’t have any networking or wallet security until a future article.

What is a Blockchain?

The first step when writing any software application is always to figure out your data structures. This is true whether it’s Haskell, Perl, C, or SQL. We’ll put the major types and typeclass instances in their own module:

{-# LANGUAGE GeneralizedNewtypeDeriving, NoImplicitPrelude, DeriveTraversable, DeriveDataTypeable, StandaloneDeriving, TypeSynonymInstances, FlexibleInstances #-} module Haskoin.Types where import Protolude import Crypto.Hash import Control.Comonad.Cofree import Data.Data import qualified Data.Vector as V newtype Account = Account Integer deriving ( Eq , Show , Num ) data Transaction = Transaction { _from :: Account , _to :: Account , _amount :: Integer } deriving ( Eq , Show ) newtype BlockF a = Block ( V . Vector a ) deriving ( Eq , Show , Foldable , Traversable , Functor , Monoid ) type Block = BlockF Transaction type HaskoinHash = Digest SHA1 data BlockHeader = BlockHeader { _miner :: Account , _parentHash :: HaskoinHash } deriving ( Eq , Show ) data MerkleF a = Genesis | Node BlockHeader a deriving ( Eq , Show , Functor , Traversable , Foldable ) type Blockchain = Cofree MerkleF Block

MerkleF is a higher-order Merkle tree type that adds a layer onto some other type. The Cofree MerkleF Block does two things: It recursively applies MerkleF to produce a type for all depths of Merkle trees, and it attaches an annotation of type Block to each node in the tree.

When using Cofree , anno :< xf will construct one of these annotated values.

It will be more useful to look at an “inverted” tree where each node knows its parent, rather than one where each node knows its children. If each node knew its children, adding a single new block to the end would require changing every node in the tree. So MerkleF produces a chain, not a tree.

Protolude is a replacement Prelude that I’ve been using recently in moderately-sized projects. Prelude has a lot of backwards-compatibility concerns, so a lot of people shut it off with the NoImplicitPrelude language extension and import a custom one.

Why do we choose this weird MerkleF type over the simpler one below?

newtype Block = Block ( V . Vector Transaction ) data Blockchain = Genesis Block | Node Block BlockHeader Blockchain

The main reason is to get those Functor , Traversable , and Foldable instances, because we can use them to work with our Merkle tree without having to write any code. For example, given a blockchain

import qualified Data.Vector as V let genesis_block = Block ( V . fromList [] ) let block1 = Block ( V . fromList [ Transaction 0 1 1000 ]) let genesis_chain = genesis_block :< Genesis let chain1 = block1 :< Node ( BlockHeader { _miner = 0 , _parentHash = undefined }) genesis_chain let chain2 = block1 :< Node ( BlockHeader { _miner = 0 , _parentHash = undefined }) chain1

, here’s how you can get all of its transactions:

let txns = toList $ mconcat $ toList chain2 -- [Transaction {_from = Account 0, _to = Account 1, _amount = 1000},Transaction {_from = Account 0, _to = Account 1, _amount = 1000}] let totalVolume = sum $ map _amount txns -- 2000

I tested the above using stack ghci to enter an interactive prompt.

Real blockchains have a lot of useful things in the header, such as timestamps or nonce values. We can add them to BlockHeader as we need them.

Constructing Chains

A bunch of abstract types that are awkward to use aren’t very useful by themselves. We need a way to mine new blocks to do anything interesting. In other words, we want to define mineOn and makeGenesis :

module Haskoin.Mining where type TransactionPool = IO [ Transaction ] mineOn :: TransactionPool -> Account -> Blockchain -> IO Blockchain mineOn pendingTransactions minerAccount root = undefined makeGenesis :: IO Blockchain makeGenesis = undefined

The genesis block is pretty easy, since it doesn’t have a header:

makeGenesis = return $ Block ( V . fromList [] ) :< Genesis

We can write mineOn without any difficulty, transaction limiting, or security pretty easily if we knew how to calculate a parent node’s hash:

mineOn :: TransactionPool -> Account -> Blockchain -> IO Blockchain mineOn pendingTransactions minerAccount parent = do ts <- pendingTransactions let block = Block ( V . fromList ts ) let header = BlockHeader { _miner = minerAccount , _parentHash = hash parent } return $ block :< Node header parent hash :: Blockchain -> HaskoinHash hash = undefined

Crypto.Hash has plenty of ways to hash something, and we’ve chosen type HaskoinHash = Digest SHA1 earlier. But in order to use it, we need some actual bytes to hash. That means we need a way to serialize and deserialize a Blockchain . A common library to do that is binary , which provides a Binary typeclass that we’ll implement for our types.

It’s not difficult to write instances by hand, but one of the advantages of using weird recursive types is that the compiler can generate Binary instances for us. Here’s complete code to serialize and deserialize every type we need:

{-# LANGUAGE StandaloneDeriving, TypeSynonymInstances, FlexibleInstances, UndecidableInstances, DeriveGeneric, GeneralizedNewtypeDeriving #-} module Haskoin.Serialization where import Haskoin.Types import Control.Comonad.Cofree import Crypto.Hash import Data.Binary import Data.Binary.Get import Data.ByteArray import qualified Data.ByteString as BS import qualified Data.ByteString.Lazy as BSL import Data.Vector.Binary import GHC.Generics instance ( Binary ( f ( Cofree f a )), Binary a ) => Binary ( Cofree f a ) where instance ( Binary a ) => Binary ( MerkleF a ) where instance Binary BlockHeader where instance Binary Transaction where deriving instance Binary Account deriving instance Binary Block deriving instance Generic ( Cofree f a ) deriving instance Generic ( MerkleF a ) deriving instance Generic BlockHeader deriving instance Generic Transaction instance Binary HaskoinHash where get = do mDigest <- digestFromByteString <$> ( get :: Get BS . ByteString ) case mDigest of Nothing -> fail "Not a valid digest" Just digest -> return digest put digest = put $ ( convert digest :: BS . ByteString ) deserialize :: BSL . ByteString -> Blockchain deserialize = decode serialize :: Blockchain -> BSL . ByteString serialize = encode

I only included deserialize and serialize to make it clearer what the end result of this module is. Let’s drop them in favor of decode and encode from Data.Binary .

Generic is a way of converting a value into a very lightweight “syntax tree” that can be used by serializers(JSON, XML, Binary, etc.) and many other typeclasses to provide useful default definitions. The Haskell wiki has a good overview. binary uses these Generic instances to define serializers that work on just about anything.

We had to hand-write a Binary instance for HaskoinHash because Digest SHA1 from the Crypto.Hash library didn’t provide it or a Generic instance. That’s okay - digests are pretty much bytestrings anyways, so it was only a few lines.

Here’s how to use them to implement mineOn :

import Crypto.Hash ( hashlazy ) mineOn :: TransactionPool -> Account -> Blockchain -> IO Blockchain mineOn pendingTransactions minerAccount parent = do ts <- pendingTransactions let block = Block ( V . fromList ts ) let header = BlockHeader { _miner = minerAccount , _parentHash = hashlazy $ encode parent } return $ block :< Node header parent

And here’s how to test that this actually works:

testMining :: IO Blockchain testMining = do let txnPool = return [] chain <- makeGenesis chain <- mineOn txnPool 0 chain chain <- mineOn txnPool 0 chain chain <- mineOn txnPool 0 chain chain <- mineOn txnPool 0 chain chain <- mineOn txnPool 0 chain return chain -- GHCI > chain <- testMining Block [] :< Node ( BlockHeader { _miner = Account 0 , _parentHash = efb3febc87c41fffb673a81ed14a6fb4f736df79 }) ( Block [] :< Node ( BlockHeader { _miner = Account 0 , _parentHash = 2 accb557297850656de70bfc3e13ea92a4ddac29 }) ( Block [] :< Node ( BlockHeader { _miner = Account 0 , _parentHash = f51e30233feb41a228706d1357892d16af69c03b }) ( Block [] :< Node ( BlockHeader { _miner = Account 0 , _parentHash = 0072e83 ae8e9e22d5711fd832d350f5a279c1c12 }) ( Block [] :< Node ( BlockHeader { _miner = Account 0 , _parentHash = c259e771b237769cb6bce9a5ab734c576a6da3e1 }) ( Block [] :< Genesis ))))) > encode chain " \NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOH\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\DC4\239\179\254\188\135\196\US\255\182 s \168\RS\209 Jo \180\247\& 6 \223 y \NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOH\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\DC4 * \204\181 W)xPem \231\v\252 > \DC3\234\146\164\221\172 ) \NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOH\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\DC4\245\RS 0#? \235 A \162 (pm \DC3 W \137 - \SYN\175 i \192 ; \NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOH\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\DC4\NUL r \232 : \232\233\226 -W \DC1\253\131 -5 \SI Z' \156\FS\DC2\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOH\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\DC4\194 Y \231 q \178\& 7v \156\182\188\233\165\171 sLWjm \163\225\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL " > ( decode $ encode chain ) :: Blockchain Block [] :< Node ( BlockHeader { _miner = Account 0 , _parentHash = efb3febc87c41fffb673a81ed14a6fb4f736df79 }) ( Block [] :< Node ( BlockHeader { _miner = Account 0 , _parentHash = 2 accb557297850656de70bfc3e13ea92a4ddac29 }) ( Block [] :< Node ( BlockHeader { _miner = Account 0 , _parentHash = f51e30233feb41a228706d1357892d16af69c03b }) ( Block [] :< Node ( BlockHeader { _miner = Account 0 , _parentHash = 0072e83 ae8e9e22d5711fd832d350f5a279c1c12 }) ( Block [] :< Node ( BlockHeader { _miner = Account 0 , _parentHash = c259e771b237769cb6bce9a5ab734c576a6da3e1 }) ( Block [] :< Genesis )))))

If you’re testing serialization code at home, you may prefer to use the base16-bytestring library to hex-encode your ByteString s:

> import qualified Data.ByteString.Base16.Lazy as BSL > chain <- testMining > BSL . encode $ encode chain 00000000000000000100000000000000000000000014 efb3febc87c41fffb673a81ed14a6fb4f736df79000000000000000001000000000000000000000000142accb557297850656de70bfc3e13ea92a4ddac2900000000000000000100000000000000000000000014f51e30233feb41a228706d1357892d16af69c03b000000000000000001000000000000000000000000140072e83ae8e9e22d5711fd832d350f5a279c1c1200000000000000000100000000000000000000000014c259e771b237769cb6bce9a5ab734c576a6da3e1000000000000000000

Note that it will probably be a PITA for a C programmer trying to follow our serialization/deserialization code because the byte-wrangling is hidden in a lot of really generic code. If you want to produce a spec for people to use(always a good idea), you’ll probably want to hand-roll your serialization code so it’s self-documenting.

Mining

There are a couple mining-related problems with this so-called blockchain:

People can have negative balances, so people can create a “scapegoat account” that they transfer unlimited amounts of money from. There is no transaction limiting, so someone could create a huge block and run our miners out of memory. We always mine empty blocks, so nobody can transfer money. There is no difficulty, so miners aren’t proving they’ve done any work.

I say that these are all mining problems because the code that miners run is going to deal with them.

#3 we’ll wait for Networking to solve. The rest we can do now.

To solve #1, we need account balances for anyone with a transaction that we’re mining a block for. Let’s go ahead and calculate all possible account balances:

blockReward = 1000 balances :: Blockchain -> M . Map Account Integer balances bc = let txns = toList $ mconcat $ toList bc debits = map ( \ Transaction { _from = acc , _amount = amount } -> ( acc , - amount )) txns credits = map ( \ Transaction { _to = acc , _amount = amount } -> ( acc , amount )) txns minings = map ( \ h -> ( _minerAccount h , blockReward )) $ headers bc in M . fromListWith ( + ) $ debits ++ credits ++ minings

And then once we have a parent blockchain, we know how to filter out the invalid transactions:

validTransactions :: Blockchain -> [ Transaction ] -> [ Transaction ] validTransactions bc txns = let accounts = balances bc validTxn txn = case M . lookup ( _from txn ) accounts of Nothing -> False Just balance -> balance >= _amount txn in filter validTxn txns

To solve #2, I’ll let the current miner choose however many transactions he wants to put in his block. That means I’ll put a constant globalTransactionLimit = 1000 at the top that we’ll use when mining, but we won’t verify past blocks using it.

To solve #4, we need to add a nonce field to our BlockHeader that the miner can increment until he finds a good hash. We’ll make it an arbitrarily-large integer to avoid the scenario that no nonce values yield a sufficiently-difficult hash. And since we want to adjust our difficulty so blocks take roughly the same time to mine, we’ll store a timestamp in the header.

import Data.Time.Clock.POSIX -- Add new fields data BlockHeader = BlockHeader { _miner :: Account , _parentHash :: HaskoinHash , _nonce :: Integer , _minedAt :: POSIXTime } deriving ( Eq , Show ) -- Add serializers for POSIXTime instance Binary POSIXTime where get = fromInteger <$> ( get :: Get Integer ) put x = put $ ( round x :: Integer ) globalTransactionLimit = 1000 mineOn :: TransactionPool -> Account -> Blockchain -> IO Blockchain mineOn pendingTransactions minerAccount parent = do ts <- pendingTransactions ts <- return $ validTransactions parent ts ts <- return $ take globalTransactionLimit ts loop ts 0 where validChain bc = difficulty bc < desiredDifficulty parent loop ts nonce = do now <- getPOSIXTime let header = BlockHeader { _miner = minerAccount , _parentHash = hashlazy $ encode parent , _nonce = nonce , _minedAt = now } block = Block ( V . fromList ts ) candidate = block :< Node header parent if validChain candidate then return candidate else loop ts ( nonce + 1 ) difficulty :: Blockchain -> Integer difficulty = undefined desiredDifficulty :: BlockChain -> Integer desiredDifficulty = undefined

We enter loop and keep incrementing the counter and fetching the time until we find a candidate with the right difficulty. The actual difficulty of a Blockchain is just its hash converted to an integer:

import Crypto.Number.Serialize ( os2ip ) difficulty :: Blockchain -> Integer difficulty bc = os2ip $ ( hashlazy $ encode bc :: HaskoinHash )

How do we know what the right difficulty is? To start with, we’ll calculate the average time-between-blocks for the last 100 blocks:

numBlocksToCalculateDifficulty = 100 blockTimeAverage :: BlockChain -> NominalDiffTime blockTimeAverage bc = average $ zipWith ( - ) times ( tail times ) where times = take numBlocksToCalculateDifficulty $ map _minedAt $ headers bc headers :: BlockChain -> [ BlockHeader ] headers Genesis = [] headers ( _ :< Node x next ) = x : headers next average :: ( Foldable f , Num a , Fractional a , Eq a ) => f a -> a average xs = sum xs / ( if d == 0 then 1 else d ) where d = fromIntegral $ length xs

Let’s have a target time of 10 seconds. Suppose blockTimeAverage bc gives 2 seconds, so we want blocks to take 5 times as long: adjustmentFactor = targetTime / blockTimeAverage bc = 5. Which means we want only 1/5 of the originally-accepted blocks to be accepted.

Since hashes are uniformly-distributed, 1/5 of the original hashes are less than originalDifficulty / 5 , which will be our new difficulty. That’s what Bitcoin does: difficulty = oldDifficulty * (2 weeks) / (time for past 2015 blocks) .

genesisBlockDifficulty = 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF targetTime = 10 -- BEWARE: O(n * k), where k = numBlocksToCalculateDifficulty desiredDifficulty :: Blockchain -> Integer desiredDifficulty x = round $ loop x where loop ( _ :< Genesis ) = genesisBlockDifficulty loop x @ ( _ :< Node _ xs ) = oldDifficulty / adjustmentFactor where oldDifficulty = loop xs adjustmentFactor = min 4.0 $ targetTime ` safeDiv ` blockTimeAverage x

Here are a few recent mining times using these calculations:

> exampleChain <- testMining > exampleChain <- mineOn ( return [] ) 0 exampleChain -- Repeat a bunch of times > mapM_ print $ map blockTimeAverage $ chains exampleChain 6.61261425 s 6.73220925 s 7.97893375 s 12.96145975 s 10.923974 s 9.59857375 s 7.1819445 s 2.2767425 s 3.2307515 s 7.215131 s 15.98277575 s

They hover around 10s because targetTime = 10 .

Persistence

We’ll save the blockchain on disk, and give people 3 tools:

A tool to mine blocks and create a new chain

A tool to list account balances

The first tool is the miner:

{-# LANGUAGE NoImplicitPrelude, OverloadedStrings #-} module Haskoin.Cli.Mine where import Haskoin.Mining import Haskoin.Serialization import Haskoin.Types import Protolude import System.Environment import Data.Binary import qualified Data.ByteString.Lazy as BSL import System.Directory import Prelude ( read ) defaultChainFile = "main.chain" defaultAccount = "10" main :: IO () main = do args <- getArgs let ( filename , accountS ) = case args of [] -> ( defaultChainFile , defaultAccount ) [ filename ] -> ( filename , defaultAccount ) [ filename , account ] -> ( filename , account ) _ -> panic "Usage: mine [filename] [account]" swapFile = filename ++ ".tmp" txnPool = return [] account = Account $ read accountS forever $ do chain <- loadOrCreate filename makeGenesis :: IO Blockchain newChain <- mineOn txnPool account chain encodeFile swapFile newChain copyFile swapFile filename print "Block mined and saved!" loadOrCreate :: Binary a => FilePath -> ( IO a ) -> IO a loadOrCreate filename init = do exists <- doesFileExist filename if exists then decodeFile filename else do x <- init encodeFile filename x return x

The second one prints all of the account balances

{-# LANGUAGE NoImplicitPrelude, OverloadedStrings #-} module Haskoin.Cli.ListBalances where import Haskoin.Mining import Haskoin.Serialization import Haskoin.Types import Protolude import System.Environment import Data.Binary import qualified Data.Map as M import qualified Data.ByteString.Lazy as BSL defaultChainFile = "main.chain" main :: IO () main = do args <- getArgs let ( filename ) = case args of [] -> ( defaultChainFile ) [ filename ] -> ( filename ) _ -> panic "Usage: list-balances [filename]" chain <- decodeFile filename :: IO Blockchain forM_ ( M . toAscList $ balances chain ) $ \ ( account , balance ) -> do print ( account , balance )

Here’s its output:

$ stack exec list - balances ( Account 10 , 23000 )

So I’ve apparently mined 23 blocks just testing stack exec mine .

Conclusion

We developed a simple blockchain data structure. You can browse the repository on Github.

Future Haskoin-related articles may cover

Using networking and concurrency primitives to set up a peer-to-peer network.

Securing accounts in wallets, so other people can’t transfer money out of your account.

Building a ‘blockchain explorer’ website

GPU-accelerating our hashing

FPGA-accelerating our hashing

Future cryptocurrency-related articles may cover: