Note: This is the first of a multi-part post

Today, we begin looking into the very core of the Bitcoin network – the protocol. A sound foundation into the protocol is necessary to gain a true understanding into how and why this system works.

For more details into the communication layer look for my posts on The Network.

Since this blog is fairly focused on technical aspects, we do not go into philosophical discussions unless they aid in a technical understanding of the material presented. Therefore, we assume that the reader has gained a fairly basic understanding of Bitcoin concept and is now looking for more technical understanding of the system

I believe that the best way to understand theoretical concepts is to apply them practically. So during the process of this post, we will build a basic Bitcoin node in python one step at a time. Python is a simple language with very little learning curve, so even if you are not familiar with it, you can quickly grasp it’s syntax.

Approach

Our approach to understanding the protocol would be:

We first look at the outline of a Bitcoin block and study it Then we will build some basic tools that will help us parse a block and various data types Then we will get deeper into the communication layer of a node and study it Once we have a good grasp of the communication layer, we would build tools to help us complete the node

The building Blocks

The basic building block of the Bitcoin protocol is a Block (pun intended). You can think of a block as a page in a ledger book. Each block contains all the transactions that have been verified and added to the ledger. The block also includes the hash of previous block, thereby creating a link to it’s parent block. Further, a block creation also leads to the creation of new bitcoins for circulation into the network.

For more details into the theoretical underpinnings look for my posts on The Network.

To begin looking into the format of a block, we first need to find a Bitcoin block to dissect. First install and launch the bitcoin-qt client on your linux/mac/windows desktop. If this is the first time you’re launching the client , it will begin synchronizing with the network and start downloading blocks that make up the blockchain. As it does this sync, it will build a database of blocks in the data directory, which would be located at:

mac : ~/Library/Application Support/Bitcoin/

: ~/Library/Application Support/Bitcoin/ linux : ~/.bitcoin/ on linux

: ~/.bitcoin/ on linux windows: %APPDATA%\Bitcoin

You can jump into this directory even if the network sync is not finished yet. Under this directory, you will find a subdir called blocks. Doing an ls on it will bring a list of available blocks to dissect.

coinlogic.info@proto $>cd ~/Library/Application\ Support/Bitcoin/ coinlogic.info@proto $>ls blocks chainstate database db.log debug.log peers.dat wallet.dat coinlogic.info@proto $>cd blocks coinlogic.info@proto $>ls blk00000.dat blk00010.dat blk00020.dat blk00030.dat blk00040.dat rev00000.dat rev00010.dat rev00020.dat rev00030.dat rev00040.dat blk00001.dat blk00011.dat blk00021.dat blk00031.dat blk00041.dat rev00001.dat rev00011.dat rev00021.dat rev00031.dat rev00041.dat blk00002.dat blk00012.dat blk00022.dat blk00032.dat blk00042.dat rev00002.dat rev00012.dat rev00022.dat rev00032.dat rev00042.dat blk00003.dat blk00013.dat blk00023.dat blk00033.dat blk00043.dat rev00003.dat rev00013.dat rev00023.dat rev00033.dat rev00043.dat blk00004.dat blk00014.dat blk00024.dat blk00034.dat blk00044.dat rev00004.dat rev00014.dat rev00024.dat rev00034.dat rev00044.dat blk00005.dat blk00015.dat blk00025.dat blk00035.dat blk00045.dat rev00005.dat rev00015.dat rev00025.dat rev00035.dat rev00045.dat blk00006.dat blk00016.dat blk00026.dat blk00036.dat blk00046.dat rev00006.dat rev00016.dat rev00026.dat rev00036.dat rev00046.dat blk00007.dat blk00017.dat blk00027.dat blk00037.dat blk00047.dat rev00007.dat rev00017.dat rev00027.dat rev00037.dat rev00047.dat blk00008.dat blk00018.dat blk00028.dat blk00038.dat blk00048.dat rev00008.dat rev00018.dat rev00028.dat rev00038.dat rev00048.dat blk00009.dat blk00019.dat blk00029.dat blk00039.dat index rev00009.dat rev00019.dat rev00029.dat rev00039.dat

Each blk00*.dat file is a collection of several raw blocks and is several megs in size.

Let’s copy one of these block dat files to analyze.

coinlogic.info@proto $>mkdir ~/coinlogic coinlogic.info@proto $>cp blk00003.dat ~/coinlogic/ coinlogic.info@proto $>cd ~/coinlogic/

Format

Each block follows a well defined format as described here in the wiki and begins with a magic number.

Field Description Size Magic no value always 0xD9B4BEF9 4 bytes Blocksize number of bytes following up to end of block 4 bytes Blockheader consists of 6 items 80 bytes Transaction counter positive integer VI = VarInt 1 – 9 bytes transactions the (non empty) list of transactions <Transaction counter>-many transactions

src: https://en.bitcoin.it/wiki/Blocks

Magic Number

The first element of the block is a 4 byte magic number, whose value is always 0xD9B4BEF9. Bitcoin protocol uses little-endian representation for integers, therefore reading the file as binary would result in following sequence of bytes:

0xF9 0xBE 0xB4 0xD9

You can confirm this on a mac or a linux machine with the help of hexdump (on a windows machine, you may choose to install a hex viewer) as shown below:

coinlogic.info@proto $>hexdump -n 32 blk00003.dat 0000000 f9 be b4 d9 30 75 00 00 01 00 00 00 21 a2 bc 03 0000010 6d 18 2f 11 f5 5a bd 5c b4 32 a2 7b 22 79 7e 53 0000020

Blocksize

The magic number is then followed 4 bytes is the length of the block in bytes.

coinlogic.info@proto $>hexdump -n 32 blk00003.dat 0000000 f9 be b4 d9 30 75 00 00 01 00 00 00 21 a2 bc 03 0000010 6d 18 2f 11 f5 5a bd 5c b4 32 a2 7b 22 79 7e 53 0000020

In our case converting the four bytes 0x30 0x75 0x00 0x00 into little-endian integer yields 0x00007530 bytes, which in decimals is 30000 bytes

Block Header

Next 80 bytes is the block header. We will analyze the block header in more detail, so let’s extract just the header part and save it into a separate file by using -s option to skip first 8 bytes of magic number and blocksize.

coinlogic.info@proto $>hexdump -n 80 -s8 blk00003.dat > blockheader.dat

Each block header contains the following pieces of information:

Field Purpose Updated when… Size (Bytes) Version Block version number You upgrade the software and it specifies a new version 4 hashPrevBlock 256-bit hash of the previous block header A new block comes in 32 hashMerkleRoot 256-bit hash based on all of the transactions in the block A transaction is accepted 32 Time Current timestamp as seconds since 1970-01-01T00:00 UTC Every few seconds 4 Bits Current target in compact format The difficulty is adjusted 4 Nonce 32-bit number (starts at 0) A hash is tried (increments) 4

src: https://en.bitcoin.it/wiki/Block_hashing_algorithm

These components are highlighted in the block header for our block below.

coinlogic.info@proto $>cat blockheader.dat 0000008 01 00 00 00 21 a2 bc 03 6d 18 2f 11 f5 5a bd 5c 0000018 b4 32 a2 7b 22 79 7e 53 9b cb 44 5b 0e 00 00 00 0000028 00 00 00 00 49 53 1e 6f 47 93 1b 62 2d f5 0b ac 0000038 7b 22 e1 f2 d3 f1 e0 e2 d9 5d 36 6c 05 78 7e e6 0000048 19 06 55 49 8f 87 24 4e cf bb 0a 1a 62 45 a6 0a 0000058

Note that all integers are little-endian, so:

version is 0x00000001

is 0x00000001 hashPrevBlock is 0x000000000000000e5b44cb9b537e79227ba232b45cbd5af5112f186d03bca221

is 0x000000000000000e5b44cb9b537e79227ba232b45cbd5af5112f186d03bca221 hashMerkleRoot is 0x49550619e67e78056c365dd9e2e0f1d3f2e1227bac0bf52d621b93476f1e5349

is 0x49550619e67e78056c365dd9e2e0f1d3f2e1227bac0bf52d621b93476f1e5349 time is 0x4e24878f (1311016847 in decimal) This is seconds since 1970 Jan 1. You can go to epoch converter and enter the value 1311016847 to convert it into human readable format and see that the timestamp reads

GMT : Mon, 18 Jul 2011 19:20:47 GMT

is 0x4e24878f (1311016847 in decimal) This is seconds since 1970 Jan 1. You can go to epoch converter and enter the value 1311016847 to convert it into human readable format and see that the timestamp reads : Mon, 18 Jul 2011 19:20:47 GMT bits – 0x 1a 0abbcf is the compact format of target is a special kind of floating-point encoding using 3 bytes mantissa, the leading byte as exponent (where only the 5 lowest bits are used) and its base is 256. So, in this case the exponent is 0x1a = 26 The mantissa is 0x0abbcf So the exponent says this is a 26 byte base 256 integer. To convert this into it’s integer value, we would have pad it with 23 zeros to get:

0a bb cf 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 This large number is an even larger number when converted from base 256 to decimal. i.e. 0x0a * 256^26 + 0xbb * 256^25 + 0xcf*256^24 which in decimal representation is close to 4.4155582e+63

– 0x is the compact format of target is a special kind of floating-point encoding using 3 bytes mantissa, the leading byte as exponent (where only the 5 lowest bits are used) and its base is 256. nonce is 0x0aa64562, which is the number that is incremented/changed in mining to create different block headers, hence different block hashes.

Transaction Counter

The next 1 to 9 bytes after the block header are a variable length transaction counter.

Play around with a parsed version of this block at http://blockexplorer.com/b/136929

The logic on how do decode this is as follows:

Value Storage length Format < 0xfd 1 uint8_t <= 0xffff 3 0xfd followed by the length as uint16_t <= 0xffffffff 5 0xfe followed by the length as uint32_t – 9 0xff followed by the length as uint64_t

src:1

so if we skip over the 88 bytes preceding this field and read the next few bytes, we get:

coinlogic.info@proto $>hexdump -n 9 -s88 blk00003.dat 0000058 40 01 00 00 00 01 00 00 00 0000061

Following the method described in the table, the first byte is <0xfd, therefore the storage length for this integer is 1 byte and the value is in-fact represented by the first byte itself i.e. 0x40 (or 64 in decimal)

Transactions

Based on our decoding of transaction counter, we now know that there are 64 transactions in this block. We will get a detailed look into transaction logic in the coming posts, but for now, our goal is to be able to parse the transaction data inside a block. So, lets try to dissect the first two transactions. This will allow us to understand the basic structure of a transaction entry and would be helpful later in creating, sending, requesting and validating transactions.

So far we have analyzed 89 bytes of data in our .dat file. Lets skip over these 89 bytes and get a hexdump of what comes next:

coinlogic.info@proto $>hexdump -n 136 -s89 blk00003.dat 0000059 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 0000069 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000079 00 00 00 00 00 ff ff ff ff 09 04 cf bb 0a 1a 03 0000089 19 22 01 ff ff ff ff 01 f0 e3 69 2a 01 00 00 00 0000099 43 41 04 53 52 4f b3 2c d2 89 7d cb 85 0e 88 30 00000a9 90 34 aa 7f 5d e0 c6 50 fe c9 f1 95 1b 2f 6d f4 00000b9 cc 45 a4 e2 d6 81 ba 5f 53 fc 3e 44 05 3d 11 00 00000c9 fa 2b de 69 f1 f7 c7 79 dd df e7 d2 9e 16 3b 6a 00000d9 fa de 96 ac 00 00 00 00 00000e1

This brings up the first in the list of 64 transactions. Now, the first transaction in a block is always special. This is where the creator of the block a.k.a. the miner, pays themselves a reward for successfully mining the block. Since this is the first time the newly mined coins are generated, this transaction is also called the coinbase transaction (not to be confused with the exchange coinbase). The reward at the time of this writing is 25 bitcoin, but since we are looking at a block generated in July 2011, the reward for this block would be 50 bitcoins.

Explore the parsed version of this transaction at http://blockexplorer.com/t/8xb22RcKi9

Let’s de-construct this transaction.

The general format of a Bitcoin transaction in a block is:

src

Applying the above format to our transaction, we can isolate the various constituent parts as highlighted below:

coinlogic.info@proto $>hexdump -n 136 -s89 blk00003.dat 0000059 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 0000069 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000079 00 00 00 00 00 ff ff ff ff 09 04 cf bb 0a 1a 03 0000089 19 22 01 ff ff ff ff 01 f0 e3 69 2a 01 00 00 00 0000099 43 41 04 53 52 4f b3 2c d2 89 7d cb 85 0e 88 30 00000a9 90 34 aa 7f 5d e0 c6 50 fe c9 f1 95 1b 2f 6d f4 00000b9 cc 45 a4 e2 d6 81 ba 5f 53 fc 3e 44 05 3d 11 00 00000c9 fa 2b de 69 f1 f7 c7 79 dd df e7 d2 9e 16 3b 6a 00000d9 fa de 96 ac 00 00 00 00 00000e1

So this transaction has:

Version = 0x00000001 i.e. Version 1

In-counter = 0x01 or 1 input transaction

Out-counter = 0x01 or 1 output transaction

Lock_time of 0x00000000

Each input in the list of inputs is formatted as:

Field Description Size Previous Transaction hash doubled SHA256–hashed of a (previous) to-be-used transaction 32 bytes Previous Txout-index non negative integer indexing an output of the to-be-used transaction 4 bytes Txin-script length non negative integer VI = VarInt 1 – 9 bytes Txin-script / scriptSig Script <in-script length>-many bytes sequence_no normally 0xFFFFFFFF; irrelevant unless transaction’s lock_time is > 0 4 bytes

src

So in our case, since this is a coinbase transaction, there is no previous transaction. Hence, the hash of previous transaction is all zeros:

0000059 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 0000069 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000079 00 00 00 00 00 ff ff ff ff 09 04 cf bb 0a 1a 03 0000089 19 22 01 ff ff ff ff 01 f0 e3 69 2a 01 00 00 00

The next 4 bytes after the hash contain the index into the outputs of previous transaction. For coinbase transactions, this is -1 or 0xffffffff in 2’s complement form:

0000079 00 00 00 00 00 ff ff ff ff 09 04 cf bb 0a 1a 03

Next we have the length of this input transaction’s scriptSig. For us, this is 0x09. However, for coinbase transactions, the scriptSig is known as a coinbase parameter and it’s value is ignored. So we will ignore the next 09 bytes. The coinbase parameter is often used by mining algorithms to affect the hash of the block once the nonce parameter in the Block header overflows. We will discuss this in detail when we look at mining.

0000079 00 00 00 00 00 ff ff ff ff 09 04 cf bb 0a 1a 03 0000089 19 22 01 ff ff ff ff 01 f0 e3 69 2a 01 00 00 00

The coinbase parameter in this case is the purple bytes highlighted above. The next four bytes with value ff ff ff ff make up the integer sequence_no.

The byte 0x01 after the end of inputs indicates that there is one output in this transaction.

0000089 19 22 01 ff ff ff ff 01 f0 e3 69 2a 01 00 00 00

The format of each output in the list of outputs is:

Field Description Size value non negative integer giving the number of Satoshis(BTC/10^8) to be transfered 8 bytes Txout-script length non negative integer 1 – 9 bytes VI = VarInt Txout-script / scriptPubKey Script <out-script length>-many bytes

0000089 19 22 01 ff ff ff ff 01 f0 e3 69 2a 01 00 00 00 0000099 43 41 04 53 52 4f b3 2c d2 89 7d cb 85 0e 88 30 00000a9 90 34 aa 7f 5d e0 c6 50 fe c9 f1 95 1b 2f 6d f4 00000b9 cc 45 a4 e2 d6 81 ba 5f 53 fc 3e 44 05 3d 11 00 00000c9 fa 2b de 69 f1 f7 c7 79 dd df e7 d2 9e 16 3b 6a 00000d9 fa de 96 ac 00 00 00 00

The first 8 bytes make up a double word (64 bit) value representing the amount of satoshis (one satoshi in the smallest unit of transaction. 1 bitcoin = 100000000 satoshis) being transacted in this output. Converting to little-endian long int, this value gives us 0x000000012a69e3f0 or 5006550000 satoshis or 50.0655 bitcoins. 50 of these coins are the reward given to the miner and the remaining 0.0655 is the sum of all voluntary transaction fees offered in the remaining 63 transactions.

This value is followed by variable integer 0x43 which says that there are 67 remaining bytes in the output. These bytes highlighted in blue is a binary representation of a script in Bitcoins native scripting language. This script defines how the transaction would be re-spent.

The last element in the transaction is lock_time, which is 0 in this case.

00000d9 fa de 96 ac 00 00 00 00

Hopefully this gives you a basic idea of the structure of a block in the blockchain. In the next post, we will look at the second transaction in this block and begin building tools to parse the block.

References: