The block chain is a transaction database. Every full node participating in the Bitcoin network has the same copy. The Bitcoin protocol dictates its structure and is the means through which each node maintains a duplicate copy. Overall, the block chain is just a data structure for storing blocks. The block chain stores blocks in a series, beginning with the genesis block.

Also read, What is Bitcoin?” Is Google’s 4th Most Searched “What is…?” Term of 2014

A Simple Block Parser

This example is a minimal approach. In all, 138 lines of Python code are used to build this block parser. In some places, encoding and endianness are unfamiliar or backwards. Despite these minor formatting issues, below is a beginner approach to a Bitcoin block parser.

The project began with building the tools required to parse the binary data. The protocol dictates the tools that will be necessary.

import struct def uint1(stream): return ord(stream.read(1)) def uint2(stream): return struct.unpack('H', stream.read(2))[0] def uint4(stream): return struct.unpack('I', stream.read(4))[0] def uint8(stream): return struct.unpack('Q', stream.read(8))[0] def hash32(stream): return stream.read(32)[::-1] def time(stream): time = uint4(stream) return time def varint(stream): size = uint1(stream) if size < 0xfd: return size if size == 0xfd: return uint2(stream) if size == 0xfe: return uint4(stream) if size == 0xff: return uint8(stream) return -1 def hashStr(bytebuffer): return ''.join(('%x'%ord(a)) for a in bytebuffer)

These functions will read unsigned integers from the block chain. These tools will be used to build classes to represent the blocks and transactions. Each function reads a part of the block chain and will parse the binary data.

This unit test will read the first block and transaction in a block file. When used with blk000000.dat it gives the following output:

Magic Number: d9b4bef9 Blocksize: 285 Version: 1 Previous Hash 00000000000000000000000000000000 Merkle Root 4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b Time 1231006505 Difficulty 1d00ffff Nonce 2083236893 Tx Count 1 Version Number 1 Inputs 1 Previous Tx 00000000000000000000000000000000 Prev Index 4294967295 Script Length 77 ScriptSig 4ffff01d14455468652054696d65732030332f4a616e2f32303039204368616e63656c6c6f72206f6e206272696e6b206f66207365636f6e64206261696c6f757420666f722062616e6b73 ScriptSig O��EThe Times 03/Jan/2009 Chancellor on brink of second bailout for banks Seq Num ffffffff Outputs 1 Value 50.0 Script Length 67 Script Pub Key 414678afdb0fe5548271967f1a67130b7105cd6a828e0399a67962e0ea1f61deb649f6bc3f4cef38c4f3554e51ec112de5c384df7bab8d578a4c702b6bf11d5fac Lock Time 0

Parsing a Block Parser

The protocol guides the development of the classes.

class Block: def __init__(self, blockchain): self.magicNum = uint4(blockchain) self.blocksize = uint4(blockchain) self.setHeader(blockchain) self.txCount = varint(blockchain) self.Txs = [] for i in range(0, self.txCount): tx = Tx(blockchain) self.Txs.append(tx) def setHeader(self, blockchain): self.blockHeader = BlockHeader(blockchain) def toString(self): print "" print "Magic No: \t", self.magicNum print "Blocksize: \t", self.blocksize print "" print "#"*10 + " Block Header " + "#"*10 self.blockHeader.toString() print print "##### Tx Count: %d" % self.txCount for t in self.Txs: t.toString()

The block data structure matches the protocol description.

The block parser begins by reading the Magic Number. The Magic Number is the first four bytes. It is always d9b4bef9 or f9beb4d9. The following four bytes is the block size and represents the number of bytes to the end of the block. The following 80 bytes is the block header.

class BlockHeader: def __init__(self, blockchain): self.version = uint4(blockchain) self.previousHash = hash32(blockchain) self.merkleHash = hash32(blockchain) self.time = uint4(blockchain) self.bits = uint4(blockchain) self.nonce = uint4(blockchain) def toString(self): print "Version:\t %d" % self.version print "Previous Hash\t %s" % hashStr(self.previousHash) print "Merkle Root\t %s" % hashStr(self.merkleHash) print "Time\t\t %s" % str(self.time) print "Difficulty\t %8x" % self.bits print "Nonce\t\t %s" % self.nonce

Notice that only the previous block hash and Merkle Root reside in the block header. A block hash is a computed value.

After the block header is a transaction counter. The counter is a variable integer. The number of bytes it takes up changes depending on the number of bytes required to represent the total transactions. Transactions are stored in a list.

class Tx: def __init__(self, blockchain): self.version = uint4(blockchain) self.inCount = varint(blockchain) self.inputs = [] for i in range(0, self.inCount): input = txInput(blockchain) self.inputs.append(input) self.outCount = varint(blockchain) self.outputs = [] if self.outCount > 0: for i in range(0, self.outCount): output = txOutput(blockchain) self.outputs.append(output) self.lockTime = uint4(blockchain) def toString(self): print "" print "="*10 + " New Transaction " + "="*10 print "Tx Version:\t %d" % self.version print "Inputs:\t\t %d" % self.inCount for i in self.inputs: i.toString() print "Outputs:\t %d" % self.outCount for o in self.outputs: o.toString() print "Lock Time:\t %d" % self.lockTime

For each transaction, there is a list of inputs and outputs.

class txInput: def __init__(self, blockchain): self.prevhash = hash32(blockchain) self.txOutId = uint4(blockchain) self.scriptLen = varint(blockchain) self.scriptSig = blockchain.read(self.scriptLen) self.seqNo = uint4(blockchain) def toString(self): print "Previous Hash:\t %s" % hashStr(self.prevhash) print "Tx Out Index:\t %d" % self.txOutId print "Script Length:\t %d" % self.scriptLen print "Script Sig:\t %s" % hashStr(self.scriptSig) print "Sequence:\t %8x" % self.seqNo

An input is a reference to an output in a previous transaction. Id is the index of the output in the transaction. The ScriptSig is evidence of ownership over the private key that corresponds to the output.

class txOutput: def __init__(self, blockchain): self.value = uint8(blockchain) self.scriptLen = varint(blockchain) self.pubkey = blockchain.read(self.scriptLen) def toString(self): print "Value:\t\t %d" % self.value print "Script Len:\t %d" % self.scriptLen print "Pubkey:\t\t %s" % hashStr(self.pubkey)

Outputs are instructions for sending bitcoins. The value denominates the balance in Satoshis. ScriptPubKey is the first half of a ScriptSig, used with a future input to spend the coins.

Putting the Block Parser Together

Here is my sloppy block parser code.

import sys from blocktools import * from block import Block, BlockHeader def parse(blockchain): print 'print Parsing Block Chain' counter = 0 while True: print counter block = Block(blockchain) block.toString() counter+=1 def main(): if len(sys.argv) < 2: print 'Usage: blockparser.py filename' else: with open(sys.argv[1], 'rb') as blockchain: parse(blockchain) if __name__ == '__main__': main()

This script will run until the end of the file. Output will look similar to

Magic No: d9b4bef9 Blocksize: 285 ########## Block Header ########## Version: 1 Previous Hash 00000000000000000000000000000000 Merkle Root 4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b Time 1231006505 Difficulty 1d00ffff Nonce 2083236893 ##### Tx Count: 1 ========== New Transaction ========== Tx Version: 1 Inputs: 1 Previous Hash: 00000000000000000000000000000000 Tx Out Index: 0 Script Length: 77 Script Sig: 4ffff01d14455468652054696d65732030332f4a616e2f32303039204368616e63656c6c6f72206f6e206272696e6b206f66207365636f6e64206261696c6f757420666f722062616e6b73 Sequence: ffffffff Outputs: 1 Value: 5000000000 Script Len: 67 Pubkey: 414678afdb0fe5548271967f1a67130b7105cd6a828e0399a67962e0ea1f61deb649f6bc3f4cef38c4f3554e51ec112de5c384df7bab8d578a4c702b6bf11d5fac Lock Time: 0

To test the loops, I tried the block parser on the first five megabytes of block file 65.

Magic No: d9b4bef9 Blocksize: 234622 ########## Block Header ########## Version: 2 Previous Hash 0000000b1bda851ed2a5a543062a2789d7e82d7b33c838352bfba Merkle Root b22375d89ab682da9262ea8f4e784f68e5dd9eedde5a62866e3fadfa64c32f9 Time 1370602521 Difficulty 1a011337 Nonce 522491547 ##### Tx Count: 419 ========== New Transaction ========== Tx Version: 1 Inputs: 1 Previous Hash: 00000000000000000000000000000000 Tx Out Index: 0 Script Length: 37 Script Sig: 35caa3400bb89124d696e656420627920425443204775696c64800427e0014ec Sequence: ffffffff Outputs: 1 Value: 2525344340 Script Len: 25 Pubkey: 76a91427a1f12771de5cc3b73941664b2537c15316be4388ac Lock Time: 0 ========== New Transaction ========== Tx Version: 1 Inputs: 1 Previous Hash: a52c458c3a4e39b63d4a7bdcfab917444ddbfae9991245db39a85d98e9bbdb9 Tx Out Index: 0 Script Length: 106 Script Sig: 473044220447d5ae4624357f6b1361daac5d3aaeae5e197551fdf067f42aec5c7a5e51f2204117b06f77809295dd385da9b96567d3dc568e87d622ee37a758c836bb136e1212e0ac817fd21a44b43c6468d71a472e198521fcb66e36663b5a8173986d7609f Sequence: ffffffff Outputs: 2 Value: 30000000000 Script Len: 25 Pubkey: 76a914f3fc2c5c7f8e3970bd824fbce8fce1ed4c1a988ac Value: 19206322991 Script Len: 25 Pubkey: 76a9143bf18e9cc4c287764e29759b689fe51e33f757d88ac Lock Time: 0

All clear. Full source is available on github.

What do you think? Comment Below

Images from World of Computing, Bi5tcoin Wiki, and Shutterstock.